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CHAPTER  1 


INTRODUCTION 

The  application  of  statistical  methods  of  data 
analysis  to  metallurgical  practices  is  not  widespread.  In 
many  cases,  data  analysis  is  done  without  using  known 
statistical  methods,  i.e.,  by  fitting  a  line  or  curve  using 
"eye-balling."  Use  of  statistical  analysis  provides  a  more 
rigorous  analysis  of  the  data,  thus  allowing  strict  and 
meaningful  comparison  of  different  data  sets.  Why,  then, 
are  statistical  methods  not  used  more  often,  thus  allowing 
more  analytical  precision? 

It  appears  the  dominant  reason  is  the  separation  of 
disciplines  between  Metallurgy  and  Statistical  Methods.  The 
goal  of  this  thesis  is  to  examine  the  potential  uses  of 
statistical  methods  to  metallurgical  data  and  to  formulate 
reproducible  models  for  application.  Several  potential 
applications  will  be  discussed.  One  application  was  selected 
for  detailed  development.  This  application  applies  statis¬ 
tical  analysis  to  titanium  alloy  fatigue  data. 

Background  for  Example 

Titanium  alloys  are  desirable  materials  for  use  in 
aerospace  systems.  They  have  high  specific  strength 
(strength-to-density) ,  excellent  fracture  resistant 


characteristics,  and  outstanding  general  corrosion  resistance . 
However,  despite  these  desirable  characteristics,  titanium 
alloys  have  seen  limited  use  in  advanced  systems.  In  fact, 
rather  than  increasing  in  recent  years,  titanium  use  has 
decreased.  This  is  shown  in  Table  1.1,  which  lists  titanium 
use  in  conceptual  and  final  design  stages. 

TABLE  1.1 

_ Cost  Impact  of  Titanium  Use  (9) _ 


%  Titanium 

Aircraft 

Early  Concept 

Final  Design 

F-15 

50 

34 

t  B-l 

42 

22 

C-5 

24 

3 

The  reason  for  this  limited  use  is  the  high  cost  of 
titanium  components:  a  result  of  high  initial  costs  combined 
with  high  processing  costs  (forging  and  machining) .  In  order 
to  combat  these  high  costs,  the  USAF  has  identified  the  cost 
factors  associated  with  the  various  stages  of  going  from 
titanium  ore  to  final  assembled  product.  These  are  shown  in 
Table  1.2  (5) . 

It  is  immediately  obvious  that  the  component  fabri¬ 
cation  (forging,  machining  into  sheet,  etc.)  is  a  major  cost 
item.  Because  of  this,  the  Materials  Laboratory  of  the  Air 
Force  Wright  Aeronautical  Laboratories  (AFWAL)  has  sponsored 
many  programs  in  the  past  ten  years  in  net  shape  (or  near 


TABLE  1.2 


COST  BREAKDOWN  OF  TITANIUM  COMPONENTS 


Product 

Current  Cost*®  ($) 

(per  pound  of  Ti) 

Added  Cost  ($) 
(per  pound  of  Ti) 

RUTILE (Ti02) 1 

0.10-0.25 

0.10-0.25 

TICKLE  (TiCl  J  2 

1.00-2.00 

0.75-1.90 

SPONGE 

7. 00-22. 003 

5.00-21.00 

INGOT 

9.004 

2.00 

MILL  PRODUCT 

Sheet 

9. 00-25. 005 

up  to  16.00 

Foil 

300. 006 

300.00 

Forgings 

10.00-15.00 

up  to  6.00 

Plate 

12.00-17.00 

up  to  8.00 

Rod 

12.00-20.00 

up  to  11.00 

Tubing 

10.00-20.00 

up  to  11.00 

Forgings7 

150.00-300.00 

up  to  300.00 

Sheet  Metal8 

50.00-150.00 

up  to  150.00 

*92-98%  pure. 

2 

DuPont  quote  of  25C/pound;  however  availability 

question 

3Spot  market  price;  must  be  considered  artificially 

high 

4 

Assuming  triple  melt 

3Low  value  CP;  6-4  $20-25  range  depending  on  options 
such  as  gage,  width,  lengths,  finish,  etc. 

86-4  product 

7 

Includes  secondary  processing  (fabrication) , 
machining  and  inspection  costs.  As  a  good  rule  of  thumb  the 
cost  of  the  product  doubles  during  forging  and  doubles  again 
during  machining  (though  the  latter  operation  is  highly 
dependent  on  the  proportion  of  rough  and  final  machining:  the 
final  operation  being  approximately  10  times  as  expensive  per 
pound  of  stock  removed) . 


o 

SPF/DB  projected  to  be  20%  less  expensive  than 
conventional  fabrication 
9 

Low  value  welded;  high  value  seamless 
■^As  of  August/September  1980 

net  shape)  technologies.  Net  shape  technology  involves 
producing  a  component  very  close  to  its  final  shape,  thus 
reducing  forging  and  machining  operations  and  greatly 
improving  the  material  utilization  factor  (buy-to-fly  ratio) . 
Of  particular  importance  in  these  new  technologies  are  the 
mechanical  properties  of  components  so  produced.  One  such 
test  of  critical  importance  is  the  fatigue  test. 

Fatigue  testing  is  done  on  a  machine  which  subjects 
a  prepared  metal  specimen  (Figure  1.1)  to  cyclic  stress, 
generally  until  failure  occurs.  Numerous  tests  are  done 
with  specimens  of  the  same  alloy  at  different  stress  levels 
to  provide  data  points  for  plotting  of  Stress  vs.  Cycles  to 
Failure  (S-N)  curves.  The  curves  can  then  be  used  to  pre¬ 
dict  the  useful  (safe)  life  of  a  component,  given  the 
working  stress  level.  Two  important  points  can  be  identified 
during  fatigue  testing:  t^,  time  to  initiation  of  a  crack, 
and  tp,  time  of  propagation.  t^  is  the  number  of  cycles 
until  a  discernible  crack  is  detected,  and  tp  is  the  number 
of  cycles  from  t^  until  the  crack  causes  failure.  For 
further  discussion,  reference  should  be  made  to  a  text  such 
as  Dieter  (2:403-49). 


Figure  1.1.  Fatigue  Specimen 


Metallurgical  Considerations  For  Example 


In  the  present  program  the  specific  alloy  involved 
is  Ti-6Al-4V,  a  titanium  alloy,  with  six  percent  Aluminum  and 
four  percent  Vanadium,  along  with  another  titanium 
alloy,  C0R0NA~5.  The  Materials  Laboratory  has  been  inves¬ 
tigating  this  alloy  in  many  conditions  as  part  of  the  major 
net-shape  thrust.  This  includes  powder  metallurgy  (6),  cast (5) 
and  wrought  product  (5) .  Various  conditions  have  been 
selected  for  detailed  examination.  These  materials  are  pre¬ 
alloyed  powders,  elemental  powders,  cast  alloys,  and  wrought 
alloys  with  several  heat  treated  conditions  for  those  materials. 

Statistical  Methods 

Currently,  statistical  methods  are  not  used  to  develop 
the  S-N  curves  and  best-fit  lines  are  "eyeballed."  Regression 
analysis  would  provide  a  statistical  basis  for  determining  the 
equation  for  the  line  and  provide  a  reproducible  model.  This 
rigorous  definition  of  best-fit  lines  or  curves  is  very  im¬ 
portant  since  it  allows  (a)  a  strict  definition  of  lower 
bound  3-sigma  (standard  deviation)  curves:  which  are  generally 
used  for  design  purposes,  and  (b)  a  strict  analysis  of  whether 
the  data  from  two  curves  is  from  the  same  or  different  pop¬ 
ulation  groups.  This  latter  point  is  of  great  metallurgical 
significance  since  it  is  vital  to  know  whether  one  population 
(process)  is  better  than  another  (process) .  If  the  data  is 
all  from  one  population  then  there  is  no  point  in  pursuing  an 
(generally  more  costly)  alternate  process. 


An  additional  problem  encountered  when  developing 
these  curves  is  outliers,  those  points  not  close  to  the 
best-fit  lines.  Determining  why  or  even  whether  these  data 
points  should  be  considered  as  outliers  is  of  major  concern. 
Specifically  it  is  of  great  concern  to  define  statistically 
whether  or  not  the  data  represented  by  these  points  is  from 
the  general  population  or  whether  other  metallurgical  factors 
are  influencing  !:u is  d^ta  causing  "low"  or  "high"  values. 

"Low"  values  can  ’  ■  anticipated  from  microstructural  dis¬ 
continuities  such  as  voids  or  foreign  particles  while  "high" 
values  can  arise  from  microstructures  or  orientations  (texture) 
which  are  more  fatigue  resistant  than  the  general  body  of  a 
component.  It  would  be  prohibitively  costly  to  analyze  every 
failure,  but  valuable  information  can  be  gathered  by  analy¬ 
zing  cracks  resulting  from  abnormalities.  The  statistical 
technique  of  determining  outliers  provides  a  sound  basis  for 
predicting  "high"  or  "low"  values.  Once  a  point  is  determined 
to  be  an  outlier,  the  specimen  corresponding  to  that  point 
can  be  analyzed  for  the  cause  of  its  abnormal  behavior.  That 
cause  can  be  defined  by  microstructural  or  fractographic 
examination,  whether  it  may  be  a  foreign  particle,  a  surface 
crack,  or  other  mechanistic  explanation. 

All  information  concerning  the  fatigue  tests  done  on 
the  various  conditions  of  Ti-6Al-4V  is  available  from  the 
Structural  Metals  Branch  of  AFWAL.  The  data  is  in  both  tabular 
and  graphical  form. 


Computer  support  was  supplied  by  the  ASD  Computer 
Center.  The  REGRESSION  and  NONLINEAR  Subprograms  of  SPSS 
(Statistical  Package  for  the  Social  Sciences)  (13:351,368)  will 
be  used  for  the  best-fit  line  and  population  difference  pro¬ 
blems,  while  residual  analysis  will  be  applied  to  the  study  of 
the  outlier  problem. 

Statement  of  the  Problem 

There  appears  to  be  many  fruitful  applications  of 
statistics  to  mechanical  metallurgical  data.  Statistical 
methods  have  been  applied  to  metallurgical  data,  as  noted 
in  Dieter  (2:408);  however,  the  application  is  not  widespread, 
e.g.,  in  the  examination  of  fatigue  data  by  the  Materials 
Laboratory,  AFWAL .  This  work  will  briefly  discuss  several 
mechanical  properties  and  analyze,  as  an  example,  data 
associated  with  a  Ti-6A1-4V  alloy  currently  being  reviewed 
by  the  Structural  Metals  Branch  of  the  AFWAL  Materials  Labora¬ 
tory.  Statistical  analysis  may  aid  in  developing  more  rigorous 
S-N  curves,  allow  meaningful  comparison  of  the  curves,  and 
determine  outliers.  For  the  latter,  corresponding  specimens 
may  then  be  further  examined  in  detail.  A  technique  known  as 
comparison  of  regressions  will  be  used  to  test  for  differences 
between  processes.  The  existence  of  differences  between 
treatments  may  lead  to  better  decision-making  as  applied  to 
the  choice  of  processes. 
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CHAPTER  2 


MATERIALS  BACKGROUND 

Introduction 

The  purpose  of  this  chapter  is  to  define  and  discuss 
some  of  the  basic  metallurgical  principles.  Later,  some  of 
these  principles  will  be  used  in  the  development  of  statis¬ 
tical  models  to  represent  fatigue  data.  First,  general 
mechanical  properties,  which  relate  to  all  materials,  are 
discussed,  followed  by  a  more  detailed  description  of  fa¬ 
tigue  .  The  next  section  is  concerned  with  general  material 
strengthening,  using  as  an  example  a  specific  titanium 
alloy.  Titanium  is  an  important  metal  to  today's  aerospace 
industry.  Hence,  at  the  request  of  Material's  Laboratory, 
AFWAL ,  fatigue  data  using  the  titanium  alloys  were  selected 
to  illustrate  the  application  of  regression  analysis  to 
metallurgy.  The  final  sections  describe  the  characteristics 
of  titanium  and  its  alloys. 


Mechanical  Properties 

Some  of  the  most  important  characteristics  of  a 
material  are  its  mechanical  properties.  These  properties — 
e.g.  yield  strength,  tensile  strength,  fatigue  strength,  and 
toughness  --will  usually  determine  whether  a  material  can 
be  used  for  a  given  application  and  also  the  working  life  of 
a  material  component,  be  it  metallic  or  nonmetallic. 

Knowledge  of  a  material ' s  mechanical  properties  can 
be  most  helpful  when  considering  which  material  to  use  wnen, 
for  instance,  designing  a  new  fighter  aircraft  or  armored 
vehicle.  In  the  fighter,  high  temperature  performance  and 
low  weight  are  two  important  attributes.  A  strong,  light¬ 
weight  material  which  retains  its  good  qualities  at  elevated 
temperatures  caused  by  high  speed  air  friction  is  needed. 

This  material  could  reduce  fuel  consumption  by  requiring 
less  thickness  while  providing  the  same  strength  as  a 
thicker,  weaker  material.  In  the  armored  vehicle,  weight 
may  not  be  as  important  a  consideration  as  good  impact 
fracture  resistance.  The  armor  should  be  made  from  a 
material  able  to  withstand  large  sudden  impacts. 

A  measure  of  impact  fracture  resistance  is  toughness. 
A  common  test  for  toughness  is  an  impact  test  using  a  notched 

^Underlined  words  are  defined  at  the  end  of  this 
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specimen,  a  small  bar  of  material  with  a  V-shaped  notch 
located  at  the  point  where  a  swinging  hammer  impacts  the 
specimen  (15:479). 

The  hammer  is  released  from  a  fixed  height,  strikes 
the  material  at  its  lowest  point  of  swing,  and  continues 
through.  The  difference  in  height  between  start  and  finish 
gives  an  indication  of  the  energy  expended  in  breaking  the 
material ,  and  thereby  a  measure  of  the  toughness . 

Returning  to  the  fighter  aircraft  example,  the  need 
for  a  strong  material  is  obvious,  but  what  constitutes 
a  strong  material?  One  measure  of  a  material's  strength  is 
tensile  strength,  the  maximum  amount  of  stress  (units  of 
force  per  square  unit  of  length)  in  tension  or  compression 
a  material  can  be  subjected  to  before  failure  occurs.  It  is 
normally  measured  on  a  machine  which  grips  both  ends  of  a 
prepared  specimen  with  its  two  crossheads,  one  of  which 
moves.  As  the  crossheads  move  apart,  a  force  results,  the 
specimen  is  pulled  apart,  and  eventually  it  breaks.  A 
stress-strain  (a  -  s)  curve  (Figure  2.1)  graphically  displays 
a  material's  strain  response  to  stress.  Point  B  in  Figure 
2.1  identifies  the  location  of  the  tensile  strength.  The 
curve  descends  rapidly  past  this  point  because  the  material ' s 
cross-sectional  area  is  rapidly  decreasing,  and  the  material 
can  no  longer  withstand  the  greater  applied  load.  Hence, 
less  stress  is  needed  to  deform  a  material  past  the  tensile 
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Figure  2.1.  Stress-Strain  (a  -  e)  Curve  (15:446) 


strength  to  the  fracture  point.  Along  with  tensile  strength, 
several  other  properties  including  reduction  of  area  and  the 
elastic  modulus (E) — a  measure  of  the  amount  of  strain  (elon¬ 
gation  per  unit  of  original  length)  associated  with  the  early 
application  of  stress  on  an  unstressed  material — can  be 


obtained  from  the  tensile  test.  Reduction  of  area  is  the 
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percentage  reduction  of  specimen  cross-sectional  area 
resulting  from  the  stress  necessary  to  cause  fracture  (point 
C  in  Figure  2.1),  and  is  a  significant  measure  of  ductility. 
The  elastic  modulus  (E) ,  also  known  as  Young's  Modulus,  is 
the  ratio  of  stress  to  strain  over  the  interval  where 
the  ratio  remains  relatively  constant  from  the  initi¬ 
ation  of  stress  to  just  prior  to  the  yield  strength 
(point  A) .  In  this  range  of  stress — the  elastic  range — 
once  the  stress  is  relieved  the  material  will  return  to  its 
original  dimensions.  Beyond  the  upper  limit  of  the  elastic 
range — the  yield  strength,  plastic  deformation,  a  phenomenon 
where  a  material  deforms  by  various  mechanisms  within  itself 
and  does  not  return  to  its  original  dimensions  after  stress 
is  removed,  occurs. 

Another  measure  of  material  strength  is  y.*«“ld 
strength,  usually  the  stress  associated  with  a  previously 
determined  amount  of  strain.  The  most  commonly  used  yield 
strength  is  the  0.2  percent  yield  strength,  which  is  the 
stress  corresponding  to  the  point  of  0.002  or  0.2  percent 
permanent  strain  (15:445).  The  0.2  percent  yield  strength 
(point  A  in  Figure  2.1)  is  identified  on  a  stress-strain 
curve,  obtained  from  a  tensile  test,  by  constructing  a  line 
parallel  to  the  Young's  Modulus  line  and  which  passes 
through  the  0.002  strain  point.  The  point  where  this 
constructed  line  intersects  the  a  -  e  curve  is  the  0.2  per¬ 
cent  yield  strength.  At  this  point,  when  stress  is 
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removed,  the  material  returns  to  its  original  dimensions 
plus  0.2  percent  permanent  strain.  This  strength  is  gener¬ 
ally  located  near  the  start  of  the  plastic  region  of  a 
material,  and  xs  therefore  a  useful  property  to  know  when 
subjecting  a  material  to  a  stress.  Some  materials  can  be 
hardened  or  strengthened  by  applying  a  stress  which  takes 
them  into  the  plastic  range.  Once  a  material  is  in  this 
range,  it  becomes  more  difficult  for  it  to  slip  internally 
and  therefore  is  strengthened,  usually  accompanied  by  a 
loss  of  ductility. 

Another  mechanical  property,  the  one  for  which 
data  is  available  for  the  present  work,  is  fatigue  strength. 
Fatigue  failure  is  responsible  for  a  large  fraction  of 
identifiable  service  failures  (15:483).  The  fatigue 
strength  is  the  stress  at  which  a  material  fractures  when 
subjected  to  repeated  stress  cycling.  A  common  method  of 
stress  cycling  is  reverse  bending,  a  technique  which 
subjects  a  specimen,  alternately,  to  tensile  stress  on  one 
side  and  compressive  stress  on  the  other  side  and  then 
compression  on  the  first  side  and  tension  on  the  other. 

An  aircraft  wing  experiences  this  effect  due  to  the 
buffeting  caused  by  air  rapidly  rushing  past  the  wing.  The 
area  where  the  wing  is  attached  to  the  frame  experiences  a 
great  deal  of  cyclic  stress.  Generally  fatigue  strength 
(or  endurance  limit)  is  the  stress  which  will  cause  fracture 
at  the  end  of  a  specified  number  of  stress  cycles. 


.Ml  few*.. 
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normally  10^  cycles  (16:814).  The  stress  levels  used  during 
testing  are  generally  below  the  yield  strength.  In  general, 
as  the  stress  level  (S)  decreases,  the  number  of  cycles  (N) 
to  failure  increases.  This  behavior  is  frequently  described 
on  a  semi-logarithmic  plot  with  an  S-N  curve,  as  shown  in 
Figure  2.2.  Some  materials,  such  as  mild  steel  and  poly¬ 
methyl  methacrylate,  exhibit  a  fatigue  limit,  a  stress  level 
below  which  no  amount  of  cycles  will  produce  failure. 

However,  nonferrous  alloys  and  many  polymers  do  not  have  this 
limit.  Fatigue  strength  is  measured  on  a  machine  which 
holds  a  prepared  specimen  by  both  ends  and  repeatedly  applies 
cyclic  stress  generally  until  a  failure  occurs.  Numerous 
tests  are  done  with  different  specimens  of  the  same  material 
at  several  different  stress  levels  to  determine  the  pattern 
of  fatigue  behavior. 
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Microscopic  Aspects  of  Fatigue 


Fatigue  failure  can  be  explained  in  microscopic  terms. 
When  stress  is  applied  to  a  metal,  the  closest-packed  planes 
of  atoms  (slip  planes)  slip  relative  to  one  another,  when 
slip  occurs,  the  crystal  structure  of  the  metal  is  slightly 
deformed,  and  lines  of  lattice  defects,  known  as  dislocations , 
are  present  along  the  slip  planes  (see  Figure  2.3). 


Figure  2.3.  Permanent  Deformation  Resulting  From  Motion 
of  a  Dislocation  (15:192) 


With  further  application  of  stress,  more  dislocations 
form  and  move  through  the  metal.  The  dislocations  then 
interact  and  may  then  attract  each  other.  Slip  bands  appear 
and  tend  to  group  into  packets  or  striations.  Extrusions, 
small  ribbons  of  metal  apparently  extruded  from  the  slip 
bands,  or  intrusions,  small  crevices,  form  in  the  slip  bands. 
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Small  gaps  or  openings  can  occur  in  these  regions  and  can  act 
as  crack  initiators  (16:817). 


Another  explanation  of  crack  nucleation  involves  the 
intersection  of  dislocations.  When  barriers  to  dislocation 
movement  exist,  the  dislocations  pile-up  and  dislocations 
from  different  slip  planes  may  open  up  a  small  crack 
(16:769) . 

Additional  cyclic  stress  causes  the  small  cracks  to 
grow.  At  relatively  low  stresses,  the  cracks  grow  slowly 
through  the  material  until  the  cross-section  of  the  specimen 
can  no  longer  support  the  load.  At  this  point,  crack  propa¬ 
gation  is  rapid  to  failure.  Fatigue  cracks  may  initiate  at 
internal  defects,  such  as  foreign  particles;  however,  many 
cracks  start  at  surface  defects,  such  as  notches,  which  act 
as  stress  raisers. 

Material  Strengthening 

Few  materials  have  the  strength  to  withstand  the 
stresses  applied  to  an  aircraft  wing  during  high-speed 
flight.  Some  materials,  especially  metal  alloys,  have  the 
ability  to  be  strengthened  by  various  strengthening 
mechanisms.  One  of  several  strengthening  techniques  used  to 
harden  a  metal  is  heat  treatment.  A  common  method  of  heat 
treatment  is  solution  treating  and  aging  (STA) .  Solution 
treating  involves  heating  an  alloy  to  a  temperature  where 
one  constituent  totally  or  partially  dissolves.  The 
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material  is  then  aged  to  produce  a  dispersion  of  the  second 
constituent  which  strengthens  the  material  by  reducing  the 
ability  of  the  dislocations  to  move  in  the  crystal  lattice. 

Many  alloys  have  the  characteristic  of  existing  in 
different  crystal  structures  (phases)  at  different  tempera¬ 
tures.  This  property  is  called  allotropy.  For  simplicity, 
a  two-phase  titanium  alloy,  Ti-6A1-4V,  whose  phase  diagram 
is  shown  in  Figure  2.4,  will  be  discussed.  It  exists  in  the 
alpha  (a)  phase  at  lower  temperatures  and  the  beta  (8)  phase 
at  higher  temperatures. 


Figure  2.4.  Vertical  Section  of  Ternary  Ti-Al-V 

(Constant  4%V)  Phase  Diagram  (11:165) 

Solution  treatment  for  Ti-6A1-4V  is  raising  the  metal 


to  a  temperature  in  the  B  region  or  high  in  the  a  +  3  region 
and  maintaining  it  there  until  the  metal  reaches  its  equili¬ 
brium  composition  at  that  temperature.  When  raised  to  a 
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temperature  in  the  S  region,  the  alloy  becomes  a  single 
phase,  B.  When  raised  to  a  temperature  high  in  the  a  +  B 
region,  it  remains  a  two-phase  alloy;  however,  much  of  the 
at  present  at  room  temperature  is  converted  into  B.  The 
metal  is  then  quenched  to  a  lower  temperature  using  a  liquid 
cooling  medium,  such  as  water.  This  rapid  cooling  prevents 
most  of  the  6  from  returning  to  the  a  phase  by  diffusion. 

After  the  quench,  a  super-saturated  solution  exists,  in  that 
the  amount  of  3  present  is  greater  than  the  equilibrium 
amount  of  B  at  the  lower  temperature  (16:360). 

The  aging  treatment  allows  the  solution  to  become 
less  supersaturated  by  allowing  precipitation  of  some  of  the 
a  which  was  converted  to  3.  The  alloy,  after  quenching,  is 
placed  in  a  furnace  at  a  constant  intermediate  (between  room 
temperature  and  the  solution  temperature)  temperature.  The 
proper  aging  temperature  is  that  which  balances  the  mechanisms 
of  precipitation — nucleation  and  growth  by  diffusion. 
Nucleation  is  the  formation  of  the  beginnings  of  a  particles, 
and  is  favored  by  lower  temperatures.  Diffusion  is  a  method 
whereby  the  atoms  comprising  the  a  phase  flow  to  the  nuclei, 
causing  the  a  particles  to  grow  at  the  expense  of  the  3  par¬ 
ticles;  it  is  favored  by  higher  temperatures  where  atomic 
motion  is  greater.  Therefore,  an  intermediate  temperature 
provides  the  fastest  precipitation  rate,  because  both 
nucleation  and  diffusion  occur  at  moderate  rates  (16:361). 


Precipitation  of  a  is  continued  until  a  fine  dispersion 
of  a  in  g  is  present.  This  fine  dispersion  of  precipitated 
particles  increases  strength  by  acting  as  barriers  to  dislo¬ 
cation  movement.  The  particles  cause  the  dislocation  to 
move  around  them  in  the  form  of  expanding  loops  (15s371)« 

Only  when  the  loops  intersect  can  the  dislocation  move 
on  through  the  metal.  However,  the  dislocation  loop  remaining 
around  the  particle  expands  the  stress  field  of  the  particle, 
causing  subsequent  dislocations  to  require  more  stress  to 
move  around  or  through  them.  The  basis  of  strengthening  is 
that  greater  stress  is  needed  to  move  a  dislocation  past  a 
precipitate  particle  than  through  a  continuous  matrix, 
allowing  the  metal  to  withstand  a  greater  amount  of  stress 
before  failure. 

Table  2.1  gives  an  indication  of  the  value  of  heat 
treatment  on  two  titanium  alloys.  The  Annealed  condition  is 
one  where  the  grains  of  the  alloy  are  relatively  equivalent 
in  size  and  shape  and  are  strain-free.  The  Solution  Treated 
and  Aged  condition  is  the  heat-treated  condition,  which 
consists  of  a  fine  dispersion  of  one  phase  in  the  other 
phase.  Heat  treatment  increases  the  yield  strength  while 
reducing  ductility. 


TABLE  2.1 


Comparison  of  Pre-Heat  Treated  and  Post-Heat  Treated  Alloys 


All  measurements 
at  room  temperature 

0.2%  Yield  Strength 
(psi) 

Reduction 
of  Area  (%) 

Ti-6A1-4V 

Bar 

Annealed 

120 , 000a 

25a 

Solution 

Treated 

And  Aged 

1 45 , 000b 

20b 

Ti-13V-llCr-3Al 

Annealed 

125,000° 

25° 

Solution 

Treated 

And  Aged 

170 , 000d 

10d 

_ 

a  (16:22) 
b  (16:30) 
C  (16:29) 
d  (16:33) 


Characteristics  of  Titanium 

Titanium  (chemical  symbol  Ti)  is  a  highly  desirable 
engineering  material  for  aerospace  systems  due  to  its  com¬ 
bination  of  properties  which  are  important  to  those  systems: 
high  strength,  low  density,  good  fracture  resistance,  and 
excellent  heat  and  corrosion  resistance.  A  material's 
properties  at  elevated  temperatures  is  especially  important 
to  the  aircraft  industry.  The  skin  of  an  aircraft  cruising 
at  Mach  2.7  can  reach  a  temperature  of  500 °F  due  to  air 
friction  (19:61).  Of  the  currently  commercially  feasible 
engineering  metals,  titanium  has  the  best  strength-to-weight 

(yield  strength: density)  ratio. 
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TABLE  2.2 


Comparison  of  Four  Engineering  Metals^ 


Chemical  Symbol 

'  | 

Titanium | 

— 

Aluminum 

i - 

Iron 

Magnesium 

Ti 

Al 

Fe 

Mg 

Atomic  Number 

22 

13 

26 

12 

Atomic  Weight 
(atomic  mass  units; 
based  on  Carbon=12) 

47.90 

26.98 

55.85 

24.31 

Density  at  20°C(g/cc) 

4.51 

2.70 

7.86 

1.74 

Melting  Point (°C) 

1668 

660 

1536 

650 

0.2%  Yield  Strength (psi) 
i - 

20 , Q00a 

5000b 

18 , 000C 

i 

i 

i 

i 

i 

i 

i 

i 

a  (19:60) 
b  (15:415) 
c  (15:458) 
d  (14:1) 

These  materials,  however,  are  rarely  used  in  pure 
form  in  practical  engineering  construction.  They  are  usually 
alloyed  with  other  elements  to  enhance  certain  properties. 

For  example,  iron  is  alloyed  with  carbon  to  increase  strength 
and  alloyed  with  chromium  to  increase  corrosion  resistance. 
Aluminum  is  added  to  titanium  to  increase  strength,  while  the 
addition  of  vanadium  enhances  titanium's  ability  to  be 
strengthened  through  heat  treatment. 


A  characteristic  of  titanium  is  its  allotropic  crys 
tal  structure.  Allotropy  is  the  ability  of  a  substance  to 
have  different  crystal  structures  at  different  temperatures 
In  the  case  of  titanium,  it  exists  in  the  hexagonal-close- 
packed  (hep)  structure  at  room  temperature  and  up  to  the 
transformation  temperature  of  883°C  (1621°F)  .  The 
body-centered-cubic  (bcc)  structure  is  present  until  the 
melting  point.  See  Figure  2.5  for  representations  of  the 
two  crystal  structures  of  Ti. 


Figure  2.5.  Unit  Cells  of  the  Crystal  Structures  of 
Titanium  (1  5:156-57) 


The  hep  structure  characterizes  the  alpha  (a)  phase 
of  titanium,  while  the  bcc  structure  denotes  the  beta  (6) 
phase.  The  temperature  at  which  a  transforms  to  8  is  the 


beta  transus.  A  look  at  the  Ti-Al  binary  phase  diagram 
(Figure  2.6)  may  be  helpful  at  this  point. 


Characteristics  of  Titanium  Alloy  T1-6A1-4V 

The  titanium  alloy  whose  fatigue  properties  are 
under  consideration  in  this  study  is  Ti-6A1-4V.  This  is  a 
titanium-based  alloy  with  additions  of  between  5.50  and  6.75 
percent  aluminum,  between  3.50  and  4.50  percent  vanadium 
and  traces  of  other  elements  (1:388).  Table  2.3  lists  some 
of  the  properties  of  Ti-6A1-4V,  as  compared  to  those  of 
Aluminum  7075  alloy  and  4130  Steel. 

In  addition,  Ti-6A1-4V  exhibits  good  machinability 
and  weldability.  Ti-6A1-4V  is  readily  weldable,  producing 
a  weld  with  properties  close  to  those  of  the  base  metal 
(17:13).  Ti-6A1-4V  is  the  most  versatile  and  most  widely 

used  of  the  titanium  alloys.  Some  of  its  uses  include 
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Table  2.3 


Comparison  of  Three  Engineering  Alloys  (19; 61) 


Ti-6A1-4V 

A1  7075 

4130  Steel 

Density  at  20°C(g/cc) 

4.43 

2.77 

7.83 

0.2%  Yield  Strength (psi) 

125,000 

70,000 

160,000 

Melting  Range  (°C) 

1604-1671 

477-638 

1504 

( °F) 

2919-3040 

891-1180 

2739 

airplane  turbine  disks  and  blades,  aircraft  components, 
pressure  vessels,  rocket  engine  cases,  and  chemical  processing 
equipment  (17:22). 

Through  alloying,  three  classes  of  titanium  alloys 
are  possible:  alpha  alloys,  a  +  3  alloys,  and  beta  alloys, 
a  alloys  are  formed  by  additions  of  a  stabilizers.  a 
stabilizers  are  those  elements  whose  addition  cause  the  beta 
transus  to  increase,  allowing  the  a  phase  to  exist  at  a 
higher  temperature.  Some  a  stabilizers  are  aluminum,  oxygen, 
carbon,  and  nitrogen;  these  elements  generally  increase 
strength  and  decrease  ductility.  Additions  of  aluminum  can 
be  made  until  about  8  percent,  where  the  alloy  becomes 
extremely  brittle  and  difficult  to  work  with  (19:10)  because 
of  the  formation  of  the  intermetallic  Ti^Al  phase. 

B  alloys  are  formed  by  additions  of  B  stabilizers, 
those  elements  which  stabilize  the  B  phase  at  lower  temper¬ 
atures  by  lowering  the  beta  transus.  Some  common  3 
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stabilizers  are  vanadium,  tantalum,  molybdenum,  chromium, 
and  hydrogen c  Stabilizers  also  increase  the  strength  of 
the  basic  titanium  somewhat,  but  their  strong  point  lies  in 
the  fact  that  3  alloys  can  be  heat  treated  to  increase 
strength . 

The  largest  group  of  titanium  alloys  is  the  a  +  3 
group,.  These  alloys  usually  contain  both  a  and  8  stabilizers. 
Generally  mechanical  behavior  remains  stable  even  after 
exposure  to  high  temperature  and  stress.  Ti-6A1-4V  belongs 
in  this  group. 

Summary 

Knowing  a  material's  mechanical  behavior  can  help 
when  considering  choices  of  materials  to  use  for  a  weapon 
system.  Fatigue  strength  is  important  to  know  for  an 
aircraft,  which  is  subject  to  repeated  cyclic  stress.  Fa¬ 
tigue  failure  can  be  explained  by  dislocation  movement,  which 
leads  to  crack  formation. 

Titanium  and  its  widely  used  alloy  Ti-6A1-4V,  are 
high  strength  materials.  Heat  treating  allows  Ti-6Al-4V  to 
achieve  greater  strength. 

Definitions  of  Key  Terms 

1.  Alloy--a  solid  solution  of  two  or  more  metals 

2.  Allotropy — ability  of  an  element  to  exist  in  more  than 
one  crystal  structure 

Alpha  Phase  (a) — a  distinct  solid  crystal  structure;  in 
titanium  alloys,  it  is  the  low  temperature  structure 
of  pure  titanium,  hexagonal-close-packed 
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3. 


4.  Annealing — By  holding  a  material  at  a  relatively  high 

temperature,  the  strain  (dislocations)  is  removed 
from  material  which  has  been  deformed  (worked).  By 
raising  the  material  above  its  recrystallization 
temperature,  strain  free  grains  of  relatively  equiaxed 
shape  are  produced. 

5.  Beta  Phase  (g) — a  second  distinct  solid  crystal  struc¬ 

ture;  in  titanium  alloys,  it  is  the  high  temperature 
structure  of  pure  titanium,  body-centered-cubic 

6.  Beta  Transus — the  temperature  at  which  all  alpha  phase 

has  transformed  to  beta  phase;  connecting  these 
points  for  various  alloy  compositions  constructs  the 
boundary  between  the  a  +  3  region  and  the  3  region 

7.  Crystal  Structure — the  atomic  arrangement  inside  a 

material's  crystals;  the  three  relatively  simple 
types  of  crystal  structure  for  most  metals  are:  face- 
centered-cubic ,  body-centered-cubic,  and  close-packed- 
hexagonal 

8.  Diffusion — Flow  of  atoms-in  the  solid  state  this  involves 

the  movement  of  atoms  within  a  crystalline  structure. 

9 .  Dislocation — a  line  of  crystal  structure  defects  formed 

at  the  edge  of  an  atomic  plane 

10 .  Ductility — the  measure  of  a  material's  flexibility  or 

ability  to  accomodate  strain 

11.  Elastic  Modulus  (Young's  Modulus,  E) — the  constant 

ratio  of  stress  to  strain  in  the  elastic  range-that 
region  of  the  a  -  e  curve  from  initial  loading  to  near 
the  yield  strength;  E  =  a/e 

12.  Fatigue  Strength — the  stress  that  causes  failure  at  the 

end  of  a  specified  number  of  cycles  of  alternating 
stress 

13.  Nucleation — the  formation  of  nuclei,  small  "seeds"  from 

which  particles  grow 

14.  Phase — a  macroscopically  distinct  homogeneous  body  of 

matter,  such  as  liquid,  gas,  or  different  crystal 
structures  as  a  solid 

15.  Plastic — the  region  of  the  a  -  e  curve  where  the  specimen 

does  not  return  to  its  original  dimensions  upon  relief 
of  stress;  it  suffers  some  permanent  strain 
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16.  Slip  Planes — Normally  the  atomic  planes  of  highest 

density;  upon  application  of  stress,  these  planes 
move  relative  to  one  another. 

17.  Strain  (e) — elongation  per  unit  length  of  a  material  in 

response  to  an  applied  stress 

18.  Stress  (o) --force  per  unit  area  acting  on  a  surface 

19.  Tensile  Strength--the  maximum  stress,  measured  over  the 

original  specimen  cross-sectional  area,  a  material 
can  withstand  in  a  tensile  test 

20.  Toughness — fracture  resistance  of  a  material  upon  impact 

21.  Yield  Strength — the  point  on  a  c  -  e  curve,  to  which  if 

a  specimen  is  loaded  and  then  unloaded,  it  would 
suffer  a  permanent  strain  of  some  specified  value, 
usually  0.002  (0.2%) 
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CHAPTER  3 


STATISTICAL  BACKGROUND 

Introduction 

From  Chapter  2,  it  can  be  seen  that  mechanical 
properties  of  materials  are  represented  as  relationships  of 
two  or  more  quantities  (e.g.,  stress  vs.  strain)  on  graphs 
with  curves  which  describe  a  material’s  mechanical  behavior 
over  a  range  of  values.  Statistical  methods  offer  a  much 
more  precise  and  reproducible  means  of  representing  these 
relationships.  This  chapter  is  intended  to  introduce  the 
non-statistician  to  some  important  statistical  concepts.  In 
particular,  an  approach  to  curve-fitting — regression,  along 
with  some  techniques  for  testing  the  appropriateness  of  the 
regression  model  are  discussed. 

Introduction  of  Regression 

In  many  situations  it  is  desirable  to  express  a 
relationship  between  two  or  more  quantities.  For  instance, 
an  aircraft  manufacturer's  engineering  shop  may  be  interested 
in  knowing  the  relationship  between  the  useful  life  of  a 
material  and  various  conditions  of  stress.  Knowing  this 
relationship  would  enable  the  engineers  to  estimate  the  useful 
life  given  a  certain  level  of  working  stress. 
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One  branch  of  statistical  analysis  which  deals  with 
the  description  of  the  relationship  between  variable  quan¬ 
tities  and  also  with  the  prediction  of  the  value  of  an 
unknown  variable  using  known  values  of  other  variables  is 
regression  analysis.  Regression  analysis  concerns  estimating 
the  value  of  one  variable,  the  dependent  variable,  on  th^ 
basis  of  one  or  more  other  variables  called  independent 
variables  (8:357). 

Estimates  of  the  exact  values  of  dependent  variables  are 
difficult  to  determine  due  to  the  many  factors  which  could  cause 
variations.  Regression  analysis  determines  the  average 
relationship  between  dependent  and  independent  variables; 
that  is,  it  estimates  the  mean  value  of  a  dependent  variable 
for  a  given  value  of  the  independent  variables. 

The  simplest  relationship  that  can  be  hypothesized 
between  the  means  of  the  independent  variable  and  the 
dependent  variable  is  the  following  linear  form,  described 
by  the  equation  of  the  population  regression  line: 

(1)  My.x  =  a+3x 

where  u  =  the  mean  value  of  y,  the  dependent 

variable,  for  a  given  value  of  x,  the 
independent  variable 

a  =  the  y-axis  intercept 

and  3  =  the  slope  of  the  population  regression 
line  (8 : 359) . 


The  many  minor  factors  causing  variations  about  the 
mean  for  a  single  observation  can  be  represented  by  a  random 
error  term,  e,  where  e^=y^-y^ ,  the  difference  between  the 
actual  dependent  variable  value  and  the  mean  value  of  y  for  a 
particular  value  of  the  independent  variable.  The  population 
linear  regression  model  is: 

(2)  yi  =  a+Bxi+ei 

where  y.  =  the  actual  dependent  variable  value  for 
1  a  particular  independent  variable  value, 
x . 

i 

and  £ ■  =  the  random  error  at  x.  (8:360). 

l  l 

The  values  of  a  and  8  are  seldom  accurately  known 
and  therefore  must  be  estimated  using  sample  data  extracted 
from  the  population.  The  following  equation  may  be  used  to 
describe  the  sample  regression  line  when  a  sample  of  n 
observations  is  taken: 

(3)  y  =  a+bx 

where  y  =  the  estimate  of  y 

1  y  *x 

a  =  the  estimate  of  the  y-intercept,  a 

and  b  =  the  estimate  of  the  slope,  8  (8:362). 

Least-Squares  Estimation 

The  most  widely  used  method  of  finding  the  best 


estimates  of  the  regression  coefficients  is  the  method  of 


least-squares.  This  process  involves  minimizing  the  sum  of 
the  squared  differences  between  the  actual  y-value  and  the 

A 

estimate,  y^,  as  found  by  the  sample  regression  line 
every  x^  in  the  sample. 


This  quantity  is  then  differentiated  with  respect  to 
a  and  b,  with  y^  and  treated  as  constants,  and  set  to 
zero.  The  resulting  equations,  called  the  normal  equations, 
are  then  solved  for  a  and  b.  The  values  of  a  and  b  which 
satisfy  the  normal  equations  are  estimates  of  a  and  3. 


(8) 


a  = 


n 

Lh*i 

n 


-b 


n 

.  Z.x. 

1=?1  3. 

n 


The  estimators  a  and  b  are  functions  of  the  random 
variable  y^  and  therefore  are  random  variables  themselves. 

They  are  unbiased  estimators  of  a  and  8  because  their  expected 
values  equal  those  of  the  population  parameters  (E(a)=a; 

E (b) =8 ).  These  estimators  also  have  a  variance  associated 
with  them  that  is  a  function  of  the  variance  of  the  random 
variable 

How  good  are  the  least-squares  estimates  of  a  and  3? 
Are  there  other  methods  of  estimation — e.g.,  the  method  of 
moments — which  consistently  yield  estimators  of  the  regression 
coefficients  with  smaller  variance?  The  Gauss-Markov  theorem 
states  that  if  we  assume: 

1.  the  random  errors,  ,  have  a  mean  of  zero, 

2.  the  random  errors  have  a  constant  finite  variance 
oc2,  for  all  values  of  x,  and 

3.  the  random  errors  are  independent  of  each  other 

then  the  estimators  of  a,  S,  and  y  determined  by  the  least- 

y*x 

squares  criterion  are  Best  Linear  Unbiased  Estimates 
(i .e . , BLUE)  (8:366). 

Definition  of  key  terms  is  necessary  to  fully  under¬ 
stand  the  importance  of  this  theorem.  Linear  means  that  the 
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estimators  are  straight-line  (linear)  functions  of  the 
values  of  the  dependent  variable.  The  best  estimators  are 
those  which  are  efficient,  which  means  the  variance  of  the 
estimators  found  by  the  least-squares  method  is  less  than  the 
variance  of  estimators  found  by  any  other  linear  unbiased 
estimating  technique  (8:231-34). 

Hypothesis  Testing 

Assessing  the  appropriateness  of  the  regression 

model  can  be  accomplished  through  the  use  of  tests  of 

statistical  hypotheses.  Two  hypotheses  are  presented:  the 

null  hypothesis  (Hq)  and  the  alternative  hypothesis  (Ha) . 

The  null  hypothesis  was  originally  labeled  as  such  because 

it  specified  values  of  a  parameter  which  the  tester  thought 

were  not  true.  Correspondingly  the  alternative  hypotheses 

represented  the  values  of  the  parameter  believed  to  be  true. 

These  labels  have  no  special  meaning  nowadays  and  are  only 

labels  for  the  two  necessarily  conflicting  hypotheses  (8:264). 

For  the  linear  regression  model,  one  frequently  is 

interested  in  testing  the  hypothesis  that  the  slope  equals 

zero  (H  :3=0;  H  : 8#0) .  If  the  null  hypothesis  were  true, 

there  is  no  relationship  between  y  and  x;  that  is, 

u  =a+(0)x=  a.  Hence,  for  any  value  of  x,  the  prediction 
y  *x 

of  y  is  y=a. 

Before  testing  the  hypothesis  it  is  convenient  to 
assume  that: 


* 


Assumption  4.  The  random  errors  are  normally  dis¬ 
tributed.  Recall  that  a  linear  combination  of  independent 
normal  random  variables  is  normally  distributed.  Hence,  the 
Yi's  are  normally  distributed  (12:56).  From  equation  (7)  we 
note  that  b  is  a  linear  combination  of  the  y^'s, 
and  is  therefore  normally  distributed.  The  assumptions  of 
mean  of  zero,  constant  variance,  and  normality  are  shown 
graphically  in  Figure  3.1. 


The  variance  of  b  can  be  found  to  be: 
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(9) 


2 


°b  = 


n  -> 

.  (x . -x)  2 
i=l  l 


where 


=  the  variance  of  b 


o  =  the  variance  of  e 

£ 

and  x  =  the  average  value  of  x 


(12:56) 


Note  that  oc  is  an  unknown  population  parameter. 


It  can  be  shown  that 


2  n  (y-y^ 
s  =  T  — = — — 

£  i=l  n-2 


2  2 

is  an  unbiased  estimate  of  a  .  This  estimate  of  is 

£  £ 

often  called  the  Mean  Square  Error  (MSE) .  MSE  is  the  quan- 

.  2 
tity  minimized  in  least-squares  estimation  (y^-a-bx^) 

divided  by  the  appropriate  degrees  of  freedom  (n-2) .  Two 

degrees  of  freedom  are  lost  because  two  parameters--a  and  g — 

2 

are  being  estimated-  An  unbiased  estimate  of  is: 


t  ,  “%  2 

.L.  (x.-x) 

1=1  i 


2  2 
where  s,  =  the  estimate  of  o. 

b  b 


and  MSE  =  mean  square  error,  an  unbiased  estimate 
of  a  2  (12:56) . 
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It  can  be  shown  that  the  standardized  statistic 

k.  follows  the  t-distribution  with  n-2  degrees  of  freedom. 

sb 

This  allows  the  following  probability  statement  to  be  made: 

I11'  P(‘tU-i,  n-2)  2  ^  *  ‘<1-2,  n-2))  =  ^ 

the  (1-|)*100  percentile  of  the 

t-distribution  with  n-2  degrees 
of  freedom 

the  probability  (significance 
level)  desired  (12:58) 

Rearranging  the  inequalities  gives  a  1-oc  confidence 
interval  for  3: 

1121  »  '  «(l-§,  n-2) V  6  s  b  +  n-2) sb 

A  1  -  a  confidence  interval  represents  1  -  a  con¬ 
fidence  that  the  population  parameter  lies  within  the  inter¬ 
val  determined  by  equation  (12)  .  If  the  interval  contains 
zero,  the  null  hypothesis  for  the  linear  regression  model 
(Ho:S=0)  cannot  be  rejected  at  the  significance  level 
specified.  Hence,  we  have  no  reason  to  believe  that  knowledge 
of  x  is  useful  in  estimating  a  value  for  y. 

^This  a  is  not  the  same  as  the  population  regression 
coefficient  denoting  the  y-intercept. 


I- 


% 

m 


An  equivalent  test  for  rejecting  3=0  involves  a 

t-statistic,  t*=  — .  The  null  hypothesis  cannot  be  rejected 

Sb 

at  a  significance,  a,  if  1 1*  |  <  t^_a  n_2)  (12:61)  = 

The  F-statistic  can  also  be  used  to  test  the  regression 
model.  It  can  be  calculated  using  an  approach  known  as  the 
analysis  of  variance  (ANOVA) .  It  analyzes  the  total 
deviation  of  y;  from  the  mean  of  all  the  y  values,  y.  The 
total  deviation  is  divided  into  two  other  deviations,  one  of 

A 

which  is  the  unexplained  deviation  (y^-y^)  and  the  other  is 
the  explained  deviation  (y^-y)  (8:375).  Note  Figure  3.2  for 

an  illustration  of  these  deviations. 


The  analysis  of  variance  approach  is  based  upon  the 
partitioning  of  the  sums  of  squared  deviations.  The  sum  of 
the  squared  deviations  can  be  partitioned  as  follows: 

(13)  E(y.-y)2  =  Hy-y^2  +  Uy±-y)2 

SS  Total  SSE  SSR 

(12:79) 

SSE  is  the  sum  of  the  squared  deviations  of  the 
observed  values  from  the  fitted  values.  The  ratio  of  SSR 
to  SS  Total  is  a  measure  of  how  much  of  the  variability  of  the 
y^'s  is  explained  by  the  regression  line.  This  ratio 

is  known  as  the  coefficient  of  determination  and  is  denoted 

2  2  2 
as  R  .  The  range  of  R  is  zero  to  one.  When  R  =0,  no 

2 

linear  relationship  between  y  and  x  exists.  If  R  =1,  the 

observed  y^'s  all  lie  on  the  regression  line.  In  this  case, 

the  linear  regression  line  explains  the  relationship  Letween 

2 

y  and  x  perfectly.  Thus  R  is  a  measure  of  the  usefulness 
of  the  S  term  (4:62). 

Dividing  SSR  and  SSE  by  the  variance  (a£2)  produces 
two  independent  chi-square  (x2)  distributed  random  variables. 
The  ratio  of  two  independent  x2  random  variables  divided  by 
their  respective  degrees  of  freedom  defines  the  F-distribution. 
The  degrees  of  freedom  associated  with  SSR  is  one,  the  num¬ 
ber  of  independent  variables.  Hence,  the  F-statistic 
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is  the  ratio  of  the  mean  square  regression  (MSR)  to  the 
mean  square  error  (MSE) ; 


(14) 


F* 


SSR  SSE 


1  n-2 


MSR 

MSE 


If  F*  <  F,,  ,  , .  the  null  hypothesis  of  3  =  0 

(1-at,  I,  n-2) 

cannot  be  rejected.  A  relationship  exists  between  the  t  and 
F  distributions  when  testing  the  null  hypothesis  HQ:e=0: 


(15) 


( 1-a *  1,  n-2) 


{tn-l 


n-2) 


(12:87) 


Confidence  Intervals 

Based  on  the  assumption  that  the  y^ ' t  =••  .  -crmally 

distributed  allows  confidence  intervals  to  be  calculated 

about  other  population  parameters  as  well.  Confidence 

intervals  around  the  population  mean  for  a  particular  value 

of  xh,  u  >  provide  us  with  an  indication  of  how  much 
y  h 

"faith"  we  can  put  in  our  estimate  of  the  regression  line. 
The  estimated  variance  associated  with  y^,  the  fitted  value 
at  x^,  is: 


(16) 


2  -  (1  (Xh‘5)2 

s  <yh>  =  MSEg  +  - - 


i=l(xi“x) 


-v  2 . 
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The  associated  confidence  interval  is: 


(17) 


n-2)s(?h>  *  V-h  5  yh  +  n-2)slV 


where 

S(Jh> 

and 


=  fitted  dependent  variable  value  at  x^ 

=  square  root  of  the  variance  defined 
above 

=  population  parameter 

(12:68) 


Prediction  Intervals 

Another  interval  can  be  constructed  based  on  the 
predicted  value  for  a  new  observation  of  y  at  x^.  We  can 
determine  an  interval  which  with  a  specified  probability 
will  contain  a  random  observation  on  y  taken  at  xh-  Thi'; 
interval  is  called  a  prediction  interval. 

Since  the  prediction  interval  is  calculated  based 
upon  a  single  sample  point,  its  variance  is  larger  than  the 
variance  used  for  estimating  the  mean  value.  The  variance 
derives  from  two  sources:  the  variance  of  the  sampling 
distribution  of  yh  and  the  variance  of  the  distribution  of 
y  (12:72).  The  variance  is  estimated  as: 


(18) 


2  , 
s  (y 


h (new) 


.  ,xh-Jl 2  \ 
)  =  MSEl 1  +  i  + 


n  +  n  .  -.2  I 

v  (x.-x)  / 

.1=1  1 


where 


s  (y,_  ,  >)  =  the  variance  for  the  prediction 

yh (new)  interval 


MSE  =  mean  square  error 

n  =  number  of  observations,  excluding 

y,  .  .  ,  in  sample 

^h (new)  ^ 

x,  =  the  value  of  the  independent  var- 
u  iable  where  the  new  observation  is 
to  be  obtained 

x  =  mean  of  all  x^ ' s 

and  x.  =  the  independent  variable  values 

1  in  the  original  sample  i=l,2,...,n 


(12:72) 


This  variance  is  used  to  calculate  the  1  -  a  pre¬ 
diction  interval  for  Yh(new)*  The  of  the  prediction 

interval  are  given  as: 


(19) 

where 


^h  ±  t(l-|,  n-2) s ^yh (new) * 


y.  =  fitted  dependent  variable  value 
at  x.  (y.=  a  +  bx.  ) 


and  s (y,  ,  . )  =  square  root  of  variance  defined 

n (new;  above 


(10:58) 
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By  calculating  these  limits  at  many  x^'s  and  con¬ 
necting  the  points,  prediction  bands  can  be  determined  for 
any  probability  desired. 

Due  to  the  larger  variability  inherent  in  the 
prediction  interval  calculations,  the  prediction  bands  are 
wider  about  the  regression  line  than  the  confidence  bands, 
as  Figure  3.3  shows. 


Figure  3.3.  Example  of  Confidence  and  Prediction  Bands 


An  additional  feature  of  the  bands  which  results  from 
the  calculation  of  variance  is  the  bands  are  narrowest  at 
the  mean  values  (x,  y)  of  x  and  y.  This  occurence  shows  that 
the  least-squares  fitted  line  is  most  accurate  at  (x,  y) 
since  it  represents  the  average  relationship  of  the  indepen¬ 
dent  and  dependent  variables  (8:386). 


Residual  Analysis 


The  appropriateness  of  the  regression  model  and  the 
related  tests  depend  to  varying  degrees  on  how  well  the  data 
satisfy  the  assumptions  made  about  the  random  error  terms. 
Residual  analysis  is  a  technique  used  to  assess  the  reason¬ 
ableness  of  the  assumptions  of  normality  and  constant 
variance  made  for  the  random  error  terms.  The  residual  is 
the  difference  between  the  observed  value  and  the  predicted 
value  of  y:  e^  =  y^  -  y  ^ .  The  residuals  may  be  transformed 
to  standardized  residuals  by  dividing  each  residual  by  the 
square  root  of  the  mean  square  error. 

A  direct  and  revealing,  though  subjective,  method 
for  examining  residuals  is  the  graphical  approach.  Several 
plots  of  residuals  can  be  constructed  to  note  the  behavior 
of  the  residuals  against:  the  dependent  variable,  time,  the 
independent  variables,  or  any  other  variable  which  might 
provide  information  about  the  model . 

To  check  the  assumption  that  the  error  terms  have  a 
constant  variance,  c£2,  we  plot  the  standardized  residuals 
vs.  the  independent  variables.  If  the  plot  is  characterized 
by  a  band  about  zero  with  no  systematic  tendencies  toward 
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either  positive  or  negative  values  then  there  is  no  reason 
to  suspect  that  the  constant  variance  assumption  is 
violated  (12:102).  See  Figure  3.4. 


Figure  3.4.  Standardized  Residual  Plot  vs.  Independent 
Variable 


The  tests  of  hypotheses  and  confidence  intervals 
associated  with  the  population  parameters  were  based  on  the 
assumption  that  the  e^'s  are  normally  distributed  (Assumption 
4).  If  normality  holds,  about  sixty-eight  percent  of  the 
points  should  be  within  one  standard  deviation  (S)  of  the 
mean,  zero,  and  approximately  ninety-five  percent  of  the 
points  should  be  contained  within  the  interval  of  +2S  to  -2S. 
This  is  the  shaded  area  in  Figure  3.4  (10:239) 
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Testing  Random  Error  Assumptions 


Several  formal  tests  are  also  applicable  for  judging 
the  reasonableness  of  the  random  error  assumptions .  When  a 
residual  plot  shows  the  variance  increasing  or  decreasing 
in  a  systematic  manner  in  relation  to  x  or  y»  a  test  fitting 
separate  regression  functions  to  each  half  of  the  obser¬ 
vations  arranged  by  level  of  x,  calculating  mean  square 
errors  for  each  half,  and  testing  for  equality  of  error 
variances  by  an  F-test,  may  be  employed  (12:112).  Goodness- 
of-fit  tests  can  be  used  for  examining  the  normality  of  the 
error  terms  (8:523).  When  a  residual  plot  shows  the  error 
terms  increasing  or  decreasing  in  a  systematic  fashion 
with  time,  the  error  terms  may  be  considered  correlated, 
violating  the  independence  assumption. 

A  remedy  for  nonconstancy  of  variance  is  to  modify 
the  regression  model  using  transformations  of  y  or  x  or 
both  to  stabilize  the  variance.  Frequently,  lack  of  nor¬ 
mality  accompanies  nonconstant  error  variances,  so  that 
a  transformation  which  stabilizes  the  variances  often 
aids  in  making  the  distribution  of  the  error  terms  normal 


Even  though  a  transformation  may  not  be  successful  in  stabi¬ 
lizing  the  variance  or  the  normality,  a  slight  violation  of 
these  two  assumptions  need  not  cause  the  model  to  be  dis¬ 
carded  (4:59).  The  problem  of  nonindependence,  how¬ 
ever,  it  is  not  easily  solved  by  a  simple  transformation. 

If  the  independence  assumption  is  not  met,  the  model  may 
have  to  be  discarded.  The  least-squares  estimating  technique 
no  longer  produces  the  best  linear  unbiased  estimators.  A 
new  model  which  works  with  correlated  error  terms  should 
then  be  used  (12:122). 

Outliers 

An  interesting  point  shown  in  Figure  3.4  is  the 
residual  associated  with  xi=14.  This  point  may  be  con¬ 
sidered  an  outlier,  an  extreme  observation  or  one  much 
larger  than  the  others  in  absolute  value.  Frequently  out¬ 
liers  are  classified  as  those  points  which  lie  three,  four, 
or  more  standard  deviations  from  the  mean.  Since  the  least- 
squares  method  of  estimation  seeks  to  minimize  the  sum  of 
the  squared  errors,  the  presence  of  outliers  affects  the 
fitted  line  by  pulling  it  toward  the  outlier.  The  "pulled" 
line  then  may  not  truly  represent  the  relationship  between 
y  and  x. 

Various  rules  have  been  proposed  for  identifying 

outliers.  One  method  involves  fitting  a  new  regression  line 

based  on  the  other  (n-1)  observations,  omitting  the  outlier 

data  value.  The  outlier  is  then  reintroduced  and 
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treated  as  a  new  observation.  The  probability  of  obtaining 
an  observation  from  the  fitted  line  as  large  as  that  of  the 
outlier  in  n-1  observations  can  then  be  calculated,.  A  very 
small  probability--perhaps  less  than  five  percent--would 
cause  the  outlier  to  be  rejected  as  not  coming  from  the 
same  population  as  the  other  n-1  observations.  A  larger 
probability — greater  than  five  percent — would  allow  the 
outlier  to  be  retained  (12:112). 

A  corollary  of  that  method  involves  fixing  the 
probability  desired  and  calculating  a  1-a  prediction  interval 
about  the  estimated  value  of  the  dependent  variable.  If  the 
outlier  lies  within  the  limits  of  the  prediction  interval, 
it  is  retained. 

Several  other  treatments  of  outliers  are  available. 
However,  before  rejecting  outliers,  their  causes  should  be 
examined  carefully.  Outliers  may  be  the  result  of  an  error 
in  recording,  a  miscalculation,  an  equipment  malfunction,  or 
could  be  an  indication  of  an  inaccuracy  in  the  original 
model.  For  this  reason,  outliers  should  not  be  discarded 
immediately  unless  it  is  known  that  they  are  the  result  of 
an  error  in  experimentation  and  not  an  error  in  the  formu¬ 
lation  of  the  model. 

Comparison  of  Regressions 

At  times  different  sample  regression  lines  are 
determined  by  drawing  from  different  but  potentially  similar 

» 
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populations.  It  may  be  interesting  then  to  compare  the 
different  lines  by  analyzing  differences,  if  any,  that  exist 
in  slope  and  y-intercept.  For  instance,  fatigue  data  may 
be  determined  for  specimens  made  of  cast  iron  bar  and  stain¬ 
less  steel  bar.  In  this  example,  cast  iron  and  stainless 
steel  are  the  separate  populations.  We  may  be  interested 
in  determining  which  population  is  most  fatigue 
resistant. 

One  method  for  comparing  regression  models  uses  the 
concept  of  qualitative  variables.  Indicator  or  dummy 
variables  are  qualitative  variables  that  can  take  on  values 
of  0  or  1 .  To  compare  regressions  we  pool  the  observations 
(specimens) .  Indicator  variables  are  then  used  in  the 
overall  regression  model  to  indicate  the  population  observed 
(i.e.,  cast  iron  or  stainless  steel)  for  each  specimen. 

The  regression  model  for  comparing  y-intercepts 
assuming  equal  slopes  for  the  fatigue  data  example  becomes: 

(20)  yi  =  a  +  31xli  +  62x2i  + 
where 

y^  =  number  of  cycles  to  failure  for  ith  specimen 
=  common  slope 

x^=  stress  level  for  ith  specimen 

S2  =  population  regression  coefficient 

and  x  -  -T1  ith  specimen  is  cast  iron 

x2i  \ 0  if  ith  specimen  is  stainless  steel 


In  the  example,  when  a  specimen  is  taken  from  the 
cast  iron  population,  the  y-intercept  term  is  a  + 


-  (a  +  02)  +  S1xu  +  e. 


If  the  specimen  is  from  the  stainless  steel  pop¬ 
ulation,  the  y-intercept  term  is  a; 


Yi  =  a  +  S]XU  + 


If  the  y-intercepts  are  not  significantly  different, 
then  we  fail  to  reject  Ho:02=Q. 

If  the  slopes  are  thought  to  be  different  for  the 
separate  populations,  then  an  appropriate  model  is: 


yi  -  “  +  3l*li  +  62x2i  +  B3xlix2i  +  el 


(12:304) 


The  cross-product  interaction  term  causes  the 
regression  equation  to  include  S2  and  63  when  specimens  from 
the  cast  iron  population  are  used: 


yi  =  (a  +  S2)  +  +  s3)xli  + 

If  the  specimen  is  from  the  stainless  steel  pop¬ 
ulation,  then: 


yi  ■  a  +  eixii  +  £i 
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The  hypothesis  being  tested  by  the  above  model  is 
that  the  intercepts  and  the  slopes  are  the  same;  that  is, 
the  lines  are  coincident: 

H  : &  =S  =0 
o  2  3 

Ha:®2^°  and/°r 

As  before,  during  hypothesis  testing,  if  a  1  -  a 
confidence  interval  about  the  population  parameter  contains 
zero,  the  null  hypothesis  cannot  be  rejected  at  the  a 
significance  level. 

Figure  3.5  graphically  explains  the  meaning  of  the 
population  parameters:  a,  3^,  S2#  and  B3.  S2  is  the  dif¬ 
ference  in  y-intercepts ;  3^  shows  how  much  greater  or 
smaller  the  slope  is  for  the  separate  populations. 


Figure  3.5.  Illustration  of  the  Meaning  of  Regression 

Parameters  with  an  Indicator  Variable  and  an 
Interaction  Term  (12:305) 
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Nonlinear  Regression 

Up  to  this  point,  the  only  relationship  between  y 
and  x  which  has  been  discussed  is  the  linear  form.  Many 
times,  however,  a  linear  model  cannot  reasonably  fit  the 
data.  In  these  situations  it  may  be  more  realistic  to  fit 
a  model  of  the  nonlinear  form.  A  nonlinear  model  is  one 
that  is  nonlinear  in  its  parameters,  such  as  the  following 
two  examples: 

®1  2 

<  26)  y  «  e(—  +  S2x  +  £) 


(  27) 


Y  = 


rrr:<sinBix 


COS32x)  +  e 


Model  ( 26)  can  be  transformed  by  taking  logarithms 
of  both  sides.  The  transformation  produces  a  model  in  which 
the  parameters  are  linear: 


(28) 


$  i  2 

ln  y  =  —  +  s2x  + 


e 


Model  (27)  can  also  be  transformed,  but  any  resulting 
transformation  leaves  the  parameters  still  in  the  nonlinear 
form.  This  model  is  said  to  be  intrinsically  nonlinear.  The 
concepts  of  nonlinear  regression  will  be  illustrated  using 
the  intrinsically  nonlinear  model  below: 

(29)  y  =  a  +  e^x  +  s  ;  a  #  0 
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Nonlinear  Parameter  Estimation 

As  in  the  linear  case,  in  order  to  fit  the  best 
regression  line  to  the  data,  the  parameters — a  and  S — must 
be  estimated.  The  model  to  be  used  has,  at  this  point,  been 
hypothesized  from  actual  knowledge  of  the  true  model  form 
or  by  some  judicious  guessing  based  on  trends  in  the  data 
(4:264)  . 

As  with  the  linear  model,  the  least-squares  technique 
can  often  provide  reasonable  estimates  of  the  parameters, 
provided  the  error  term,  e ,  follows  all  of  the  assumptions 
stated  above  for  the  linear  model.  Again,  a  and  b  are 
estimates  of  regression  parameters,  a  and  Sr  respectively. 

The  procedure  for  estimating  a  and  b  follows: 


Method  of  Least  Squares  Estimation  for  the  Nonlinear  Model 

(30) 

n  *  2  n  bx  •  2 

Minimize  i^1  (y±  -  y±)  =  i£1 (yi  -  a  -  e1) 


Differentiating  the  above  quantity  with  respect  to 
the  parameters  and  setting  it  equal  to  zero  produces  the  two 
normal  equations : 

n  hr 

(31)  iil(Yi-a-e  X)  "  0 

(32)  i41(yi-a-eDXi) (xieDXi)  =  0 

(4:266) 
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The  normal  equations  can  then  be  solved  for  a  and  b 
to  find  the  least-squares  estimates  for  a  and  3.  An 
iterative  method  is  used  to  estimate  the  parameters  for  the 
example.  The  method  involves  iteratively  modifying  the 
parameter  estimates  so  as  to  reduce  the  sum-of-squares . 
Iterations  continue  until  the  sum-of-squares  function 
changes  less  than  a  specified  amount  or  until  the  parameter 
estimates  change  by  less  than  a  specified  amount. 

The  Newton-Raphson  technique  is  an  iterative  para¬ 
meter  estimation  method.  It  involves  solving  the  normal 
equations  for  b,  producing  a  single  equation  in  the  form 
f(b)  =0.  An  initial  value  for  b  is  guessed  and  then 
corrected  on  each  iteration.  The  corrected  value  of  b  then 
becomes  the  starting  estimate  for  b  in  the  next  iteration. 
Iterations  continue  until  an  iteration  produces  a  change  in 
b  of  less  than  a  specified  amount.  Specifically,  let 

b^  =  the  estimate  of  b  after  i  iterations 
bQ  =  the  initial  guess  of  b 

h  =  the  correcting  element  for  the  ith  iteration 
f  (bi_ i )  =  f(b)  evaluated  at 

f'(b^_^)  =  the  partial  derivative  of  f (b)  with  respect 
x  to  b  evaluated  at  b^_^ 


The  following  example  serves  to  illustrate  a  non¬ 


linear  parameter  estimating  technique. 


Data:  x 

0 
1 
2 

3 

4 


10 

13 

18 

29 
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Hypothesized  Regression  Model: 
3x 

y  =  a  +  e  +  e 


TABLE  3.1 
Iteration  Summary 


Iteration 

mm 

f  (b) 

f '  (b) 

h 

i 

*i 

1 

1.50 

2,072,703 

18,022,273 

0.12 

1.38 

2 

1.38 

710,829 

6,657,279 

0.11 

1.27 

3 

1.27 

246,868 

2,506,615 

0.10 

1.17 

4 

1.17 

81,907 

1,003,838 

0.08 

1.09 

5 

1.09 

25,995 

457,382 

0.06 

1.03 

6 

1.03 

16,827 

239,652 

0.07 

0.96 

7 

0.96 

-5,971 

99,584 

-0.06 

1.02 

8 

1.02 

3,409 

213,635 

0.02 

1.00 

9 

1.00 

-397 

168,318 

0.00 

1.00 

An  initial  estimate  of  1.50  produces  a  final  value 
for  b  of  1.00  through  nine  iterations  of  the  Newton-Raphson 
technique.  To  find  the  estimate  for  a,  substitute  the  final 
value  of  b  into  one  of  the  normal  equations.  For  this  example 
the  final  parameter  estimates  and  resulting  model  are: 


* 
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a  =9.84 
b  =1.00 

=  9.84  +  eXl 

However,  it  is  sometimes  difficult  to  eliminate  a 
parameter  by  simultaneously  solving  the  normal  equations,  as 
in  the  Newton-Raphson  technique.  For  this  reason,  other 
methods  exist  for  estimating  the  parameters  of  a  nonlinear 
model.  Two  such  methods — Gauss'  Method  and  Marquardt's 
Method — are  discussed  below. 

Gauss'  Method,  also  called  the  Linearization  Method, 
involves  expanding  the  original  nonlinear  function  in  a 
Taylor  series  near  the  initial  estimates  of  the  parameter 
values,  and  ignoring  all  higher  order  terms: 

m 

03)  y  -  y0  + 


=  the  original  function  evaluated 
at  the  initial  parameter  esti¬ 
mates 

=  the  number  of  parameters 

=  the  partial  derivatives  of  the 
original  function  with  respect 
to  each  parameter  evaluated  at 
the  initial  parameter  estimates 

=  the  ktb  parameter 

=  an  initial  estimate  of  the  ktb 
parameter 

(17:5) 
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where 


m 


3 

9b 


b=b. 


and  b 


kO 


3bk 


b=b (bk~bkO) 


The  parameters  are  now  in  linear  form.  Linear 
regression  techniques  can  be  used  to  find  estimates  of 
bk~bkO  and  hence  of  the  parameter  values  b^.  These  revised 
estimates  are  then  used  as  the  initial  estimates  in  the 
next  iteration.  Iterations  continue  until  the  sum-of- squares 
function  is  changed  less  than  a  pre-specif ied  tolerance  or 
until  another  stopping  mechanism  is  activated. 

A  second  nonlinear  parameter  estimation  technique  is 
Marquardt's  Method.  It  is  an  improvement  over  Gauss'  Method 
in  that  it  does  not  converge  as  slowly  as  Gauss'  Method  when 
it  approaches  the  least-squares  estimate  of  the  solution. 

Tests  for  the  Nonlinear  Model 

Even  though  it  does  not  always  lead  to  an  unbiased 
estimate,  the  MSE  is  often  used  as  an  estimate  of  a£2 
(4:283).  MSE  for  the  nonlinear  model  is  the  unexplained 
variation  (sum  of  squared  errors)  divided  by  n-m  degrees  of 
freedom,  where  n  equals  the  number  of  observations  in  the 
sample  and  m  is  the  number  of  parameters  to  be  estimated. 

MSE  can  then  be  used  when  comparing  the  nonlinear  model 
to  other  nonlinear  models  and  to  linear  models. 


As  with  the  linear  model,  confidence  intervals  can 
be  constructed  around  the  nonlinear  parameter  estimates, 
using  the  t-statistic  (7:78)  and  the  estimate  of  o^2. 

However,  testing  whether  a  1-a  confidence  interval  for  a 
certain  nonlinear  parameter  estimate  contains  zero  may  not 
have  the  same  significance  as  testing  the  slope  coefficient 
in  the  linear  model,  since  the  nonlinear  model  may  not  have 
a  regression  coefficient  associated  with  slope. 

The  statistical  packages  BMD  Biomedical  Computer 
Programs  (3:215)  and  Statistical  Package  for  the  Social 
Sciences  (SPSS)  (13:320),  both  of  which  have  linear  and 
nonlinear  subprograms,  have  regression  packages  available 
for  use. 

Summary 

Regression  is  a  curve-fitting  technique  which  describes 
a  relationship  between  two  or  more  variable  quantities.  The 
linear  regression  model  parameters  can  be  estimated  using  the 
technique  of  least-squares.  The  appropriateness  of  the 
regression  model  is  tested  using  statistical  hypothesis  testing. 
Confidence  and  prediction  intervals  can  also  be  used  to  test 
the  regression  model.  Separate  regression  lines  can  be 
compared  as  to  their  slopes  and/or  y- intercepts . 

In  addition,  nonlinear  regression  model  parameters 
can  also  be  estimated  by  a  least-squares  technique.  Now 
iterative  methods  must  be  used  to  approximate  a  final 
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solution.  A  biased  estimate  of  o£2  for  the  nonlinear  model 
is  MSE.  It  is  the  quantity  used  to  determine  which  regression 
model  (linear  or  nonlinear)  best  fits  the  data. 


CHAPTER  4 


METHODOLOGY 


Introduction 


The  role  the  statistical  methods,  discussed  in 
Chapter  3,  can  play  in  analyzing  the  mechanical  properties, 
some  of  which  were  discussed  in  Chapter  2,  of  materials 
is  discussed  in  this  chapter.  An  example  with  titanium  alloy 
fatigue  data  is  used.  This  data  is  the*  result  of  fatigue 
tests  performed  on  various  conditions  of  the  titanium  alloy 
Ti-6A1-4V  and  another  titanium  alloy  Corona  5.  All  the  data 
has  been  supplied,  in  tabular  and  graphical  form,  by  the 
Materials  Laboratory,  AFWAL ,  Wright-Patterson  AFB  OH. 

The  twenty-eight  different  conditions  of  Ti-6A1-4V, 
the  titanium  alloy  discussed  in  Chapter  2 ,  and  Corona  5  to 
be  statistically  analyzed  are  labeled  as  follows: 


1. 

DFVLR  MIXED 

2. 

DFVLR  FINE 

3. 

DFVLR  COARSE 

4. 

CORONA  5 

5. 

REP 

LOW  TUNGSTEN (W) 

6. 

REP 

LOW  W  HEAT  TREAT  #1 

7. 

SEP 

LOW  W 

8. 

SEP 

LOW  W  HEAT  TREAT  #1 

9. 

REP 

HIGH  W 

10. 

REP 

HIGH  W  HEAT  TREAT  #1 

11. 

SEP 

HIGH  W 

12. 

SEP 

HIGH  W  HEAT  TREAT  #1 

13. 

REP 

HYDROVAC 

14. 

AS  CAST 

15. 

AS  CAST  HEAT  TREAT  #1 

16. 

CAST  &  HIP  (Hot  Isostatic  Press) 
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17.  CAST  &  HIP  HEAT  TREAT  #1 

18.  R14 

19.  R15 

20.  R16 

21.  HR15 

22.  HR16 

23.  HR17 

24.  REP  +  W(IMI)  CONDITION  Is  As  HIP 

25.  REP  +  W(IMI)  CONDITION  2s 

HIP  +  960°C (1760°F)  for  1  hr., 

Water  Quench  (WQ) 

26.  REP  +  W(IMI)  CONDITION  3: 

Hot  Worked  +  Simulated  HIP  Thermal  Cycle  917°C 
(1683°F)  for  4  hrs..  Furnace  Cooled  (FC) 

27.  REP  +  W(IMI)  CONDITION  4: 

Hot  Worked  +  960°C (1760°F)  for  1  hr., 

WQ  +  700°C (1292°F)  for  2  hrs.,  AC 

28.  REP  +  W(IMI)  CONDITION  5: 

HIP  +  1038°C ( 1900°F)  for  1  hr., 

FC  +  732aC(1350°F)  for  4  hrs.,  AC 


The  present  procedure  represents  fatigue  data  by  a 
two-dimensional  graph,  with  the  logarithm  of  the  number  of 
cycles  to  failure (N)  on  the  abscissa  (horizontal)  axis  and 
stress  level  (S)  on  the  ordinate  (vertical)  axis.  The  data 
points  are  plotted  and  a  curve  approximating  the  relationship 
between  S  and  N  is  fitted  subjectively  according  to  the 
appearance  of  the  data  points  by  eye  or  by  means  of  a  French 
curve . 


Statistical  Analyses  Performed 

The  statistical  analyses  performed  on  the  fatigue 
data  include  fitting  two  regression  models  to  the  data;  a 
linear  model — y.=  a  +  bx^ — and  a  nonlinear  model — 

xi 

y^=  a  +  be  .  (This  nonlinear  model  was  deemed  most 
appropriate  for  the  data  after  analyzing  several  alternative 


nonlinear  models] .  Typically  when  data  is  presented  on  a 
two-dimensional  plot,  the  dependent  variable  values  are  read 
from  the  vertical  axis  and  the  independent  variable  values 
from  the  horizontal  axis.  For  this  work,  the  independent 
variable  is  stress  level,  depicted  on  the  vertical  axis, 
because  during  fatigue  testing,  the  level  of  stress  is  held 
constant  and  a  value  of  cycles  to  failure  is  determined. 

The  level  of  stress  is  then  changed  and  another  value  of 
cycles  is  determined. 

The  criterion  used  for  determining  whether  the  linear 
or  nonlinear  model  provides  a  better  fit  to  the  data  is  MSE. 
Recall  that  MSE  is  an  unbiased  estimator  of  the  variance,  o£2, 
for  the  linear  model.  Although  MSE  is  not  an  unbiased 
estimate  of  the  variance  for  the  nonlinear  model,  it  is  an 
estimate  and  can  be  used  for  comparison  purposes  (3:282).  The 
degrees  of  freedom  associated  with  MSE  for  the  nonlinear  model 
are  n-3  rather  than  the  n-2  used  in  the  linear  model,  because 
three  nonlinear  parameters  are  being  estimated.  The  smaller 
MSE  denotes  the  better  fitting  model. 

Residual  analysis  is  also  performed  to  assess  the 
reasonableness  of  the  statistical  assumptions .  This  analysis 
also  leads  to  the  detection  of  extreme  values  and  potential 
identification  as  outliers.  The  assumption  of  normality  is 
important  due  to  the  relatively  small  (4  to  16)  number  of 
data  points  for  each  condition  tested.  The  statistical 
distribution  function  describing  the  fatigue  life  at  constant 


stress  cannot  be  accurately  determined,  since  more  than  1000 
(2:409)  identical  specimens  should  be  tested  in  an 
identical  environment  at  a  constant  stress.  However,  a 
German  experimenter  Muller-Stock  tested  200  steel  specimens 
at  a  single  stress  and  found  the  distribution  to  be  normal 
when  N  is  expressed  as  logN  (2:409).  The  finding  of  Muller- 
Stock  coincides  well  with  statistical  Assumption  4,  from 
Chapter  3,  which  assumed  that  the  errors  were  normally 
distributed  about  the  mean. 

An  analysis  was  done  to  examine  whether  a  statistical 
difference  exists  between  the  fatigue  strengths  of  different 
conditions  of  Ti-6A1-4V.  Personnel  of  the  Materials  Labora¬ 
tory  were  interested  in  determining,  for  example,  whether.  Heat 
Treatment  #1  improved  the  fatigue  strength  of  REP  HIGH  W. 

The  following  comparisons  of  samples  were  hypothesized  by 
Materials  Laboratory  personnel  (15)  to  be  drawn  from  the 
same  population: 

1.  REP  LOW  W  and  REP  HIGH  W 

2.  SEP  LOW  W  and  SEP  HIGH  W 

3.  AS  CAST  and  CAST  &  HIP 

4.  DFVLR  FINE  and  DFVLR  COARSE 

5.  DFVLR  MIXED  and  DFVLR  COARSE 

6.  DFVLR  FINE  and  DFVLR  MIXED 

7.  REP  HIGH  W  and  REP  HIGH  W  HEAT  TREAT  #1 

8.  REP  LOW  W  and  REP  LOW  W  HEAT  TREAT  #1 

9.  SEP  HIGH  W  and  SEP  HIGH  W  HEAT  TREAT  #1 

10.  SEP  LOW  W  and  SEP  LOW  W  HEAT  TREAT  #1 

11.  AS  CAST  and  AS  CAST  HEAT  TREAT  #1 

12.  CAST  &  HIP  and  CAST  &  HIP  HEAT  TREAT  #1 

13.  R14  and  R15  and  R16 

14.  HR15  and  HR16  and  HR17 

15.  IMI  CONDITIONS  1,2, 3, 4,  and  5 


Computer  Support 


Computer  support  was  supplied  by  the  ASD  Computer 
Center.  Regression  analysis,  residual  analysis,  and 
comparison  of  regression  was  performed  using  the  Statistical 
Package  for  the  Social  Sciences  (SPSS)  systems  of  computer 
programs.  Nonlinear  regression  used  the  SPSS  NONLINEAR 
Subprogram,  and  linear  regression  used  the  SPSS  REGRESSION 
Subprogram. 

Summary 

This  chapter  presented  the  statistical  techniques  to 
be  used  in  analyzing  the  titanium  alloy  fatigue  data.  xn  chap 
ter  5,  inferences  are  made  regarding  the  best-fit  line  for  the 
linear  and  nonlinear  model  for  seventeen  of  the  twenty-eight 
conditions  tested.  The  remaining  eleven  conditions  were 
analyzed  using  only  the  linear  model.  Residual  analysis 
detected  potential  outliers;  data  points  more  than  two 
standard  deviations  from  the  fitted  line  were  selected  for 
examination  as  possible  outliers.  Prediction  intervals  were 
then  constructed  about  the  new  regression  line  based  on  a 
sample  which  excluded  the  extreme  value.  The  extreme  value 
was  then  treated  as  a  new  observation.  If  this  point  did 
not  lie  within  the  interval,  it  was  denoted  as  an  outlier. 

When  comparing  regression  lines,  several  comparisons  were 
made  using  the  nonlinear  model,  while  the  remaining 
comparisons  used  the  linear  model.  For  a  few  linear  model 
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comparisons,  the  slopes  were  assumed  to  be  equal*  This 
assumption  was  made  because  observation  of  the  plotted 
fitted  regression  line  indicated  that  the  slopes  were 
similar . 


* 
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CHAPTER  5 


RESULTS  AND  ANALYSIS 


Introduction 

This  chapter  shows  the  application  of  statistical 
methods  to  mechanical  properties  of  materials,  in  this 
example,  titanium  alloy  fatigue  data.  (The  fatigue  data  is 
listed  in  Appendix  A) .  Materials  Laboratory,  AFWAL,  suppliers 
of  the  data,  were  seeking  the  answers  to  three  questions: 

1.  Can  better-fitting,  more  accurate  S-N  curves  be 
determined  using  statistical  methods? 

2.  Which  of  the  data  are  outliers? 

3.  How  does  the  fatigue  data  from  various  conditions 
compare? 

This  chapter  answers  those  questions  by  presenting 
the  results  of  the  regression,  residual,  and  comparison  of 
regressions  analyses. 

Results 

The  results  of  SPSS  regression  and  residual  analyses 
for  all  twenty-eight  conditions  of  Ti-6Al-4V  and  Corona  5  are 
presented  in  Table  5.1.  The  results  are  presented  in  the 
following  fashion:  For  the  linear  model,  a  is  the  least- 
squares  estimate  for  a,  the  y-intercept ;  b  is  the  least- 
squares  estimate  for  8,  the  slope?  SSE  is  the  sum  of  squared 
errors;  MSE  is  mean  square  error?  2  SD  Outliers  are  those 
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TABLE  5.1  (continued) 


data  points  which  lie  more  than  two  standard  deviations  from 

the  fitted  regression  line  value,  the  ordered  pair  listed,  if 

any,  is  in  the  form  (x^  in  thousands  of  psi,  in  number  of 
2 

cycles) ;  R  is  the  proportion  of  total  variation  that  is 
explained  by  the  regression  line  (SSR/SST) ;  F  is  the  statistic 
(MSR/MSE)  ,  with  .1  and  n-2  degrees  of  freedom,  used  to  test 
H0:B=0,  (the  regression  line  does  not  help  explain  the  vari¬ 
ation  in  y) ;  and  Significance  denotes  the  significance  level 
associated  with  the  calculated  F-value„ 

To  illustrate  the  methodology  used  in  the  analysis, 
four  representative  sets  of  results— DFVLR  COARSE,  AS  CAST, 

R14 ,  and  R15 — will  be  discussed. 

2 

For  this  work,  an  R  of  0,70  or  greater  will  be  con¬ 
sidered  "good;”  that  is,  the  linear  model  explains  the  majority 
(seventy  percent  or  greater)  of  the  variation  of  the  yi's.  Some¬ 
times  the  nonlinear  model  was  not  developed  when  the  linear 
model  indicated  a  good  fit.  A  significance  level  of  .050 
will  be  the  criterion  for  rejecting  HQ;B=0. 

For  the  nonlinear  model,  b^ ,  b£ ,  and  b^  are  the  final 
estimates  of  the  regression  coefficients;  SSE  is  the  sum  of 
squared  errors;  MSE  is  mean  square  error;  and  2  SD  Outliers 
denotes  the  data  points  which  lie  more  than  two  standard 
deviations  from  the  fitted  regression  line  value. 

For  both  models  the  grouping  of  conditions  in  Table 
5.1  has  no  significance;  they  are  randomly  grouped. 


Analysis  of  Four  Representative  Conditions 


DFVLR  COARSE.  On  the  basis  of  lower  MSE  (.1394  to  .1557)  the 
linear  model  fits  a  better  curve  to  the  data  than  the  non¬ 
linear  model.  The  resulting  model  for  DFVLR  COARSE  is 

-4 

Yi  -  7 . 965- . 321x10  x^.  78.52  percent  of  the  variation 

is  explained  by  the  regression  line.  An  F-value  of  32.904 
corresponds  to  a  significance  level  of  less  than  .001;  there¬ 
fore,  we  are  more  than  99.999  percent  confident  that  the  null 
hypothesis  is  rejected  correctly.  A  strong  linear  relation¬ 
ship  between  log  y  and  x  exists  for  this  sample. 

AS  CAST.  The  nonlinear  model  (MSE  =  .0589)  fits  a  curve  to 

this  set  of  data  better  than  does  the  linear  model 

(MSE  =  .0636).  The  resulting  model  is  log  y^=2 . 063+2 . 250 (. 897) x 

2 

Observing  the  linear  regression  results,  with  an  R  of  .8766, 
an  F-value  of  99.433,  and  a  significance  level  of  less  than 
.001,  the  linear  regression  line  explains  more  than  eighty- 
seven  percent  of  the  total  variation  and  demonstrates  a 
relationship  between  log  y  and  x.  Hence,  since  the  non¬ 
linear  model  exhibits  less  variability  (smaller  MSE) ,  a  non¬ 
linear  relationship  exists  between  log  y  and  x. 

R14 .  The  linear  model  (MSE  =  .0472)  fits  a  better  curve  to 

the  R14  sample  data  than  the  nonlinear  model  (MSE  =  .0597)  . 
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The  R14  linear  model  is  log  y^  =  9 . 415- . 481x10  x^.  Ninety- 
six  and  eleven  tenths  percent  of  the  total  variation  is 
explained  by  the  regression  line.  An  F-value  of  123.406  with 


an  associated  significance  level  of  .000  causes  Hq:S  =  0  to 
be  rejected.  A  strong  linear  relationship  exists  between 
log  Yi  and  x1- 

R15.  The  linear  model  (MSE  =  .2978)  fits  a  better  curve  to 

the  R15  data  than  the  nonlinear  model  (MSE  =  .3476)  .  The  R15 
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linear  model  is  log  y^  =  7 . 646- . 223x10  x^.  An  R  value  of 

.3455  means  less  than  thirty-five  percent  of  the  total 
variation  of  the  ' s  is  explained  by  the  regression  line.  An 
F-value  of  3.696  and  a  significance  level  of  .096  means  the 
null  hypothesis  cannot  be  rejected.  Hence,  at  the  five  per¬ 
cent  significance  level,  neither  the  linear  nor  the  non¬ 
linear  regression  model  adequately  represents  the  relation¬ 
ship  between  log  y^  and  x 

Other  conditions.  Other  conditions  are  discussed  here  as 

special  cases  in  linear  and  nonlinear  model  comparison.  For 

REP  HIGH  W,  CAST  &  HJ.P,  and  CORONA  5,  the  linear  model  pro- 

2 

vided  the  better  fit?  however,  the  R  values  were  less  than 
.7000,  leaving  at  least  thirty  percent  of  the  variation  unex¬ 
plained.  The  slope  coefficient  estimates  are  significant,  so 
the  linear  model  is  still  more  adequate  than  y  =  y.  More 
variation  is  explained  by  the  slightly  sloped  line  than  the 
horizontal  line. 

The  nonlinear  model  fit  the  HR17  data  better  on  the 
basis  of  MSE.  The  linear  model  slope  coefficient  was  not 
significant;  therefore,  since  the  difference  in  MSE  is  small, 


the  nonlinear  parameter  estimates  cannot  be  assumed  to 
be  significant. 

The  linear  model  fit  the  REP  HYDROVAC  data  better  than 

2 

the  nonlinear  model;  however,  the  R  value  is  below  .7000  and 

the  slope  coefficient  estimate  is  not  significant.  Hence  the 

linear  model  is  not  an  adequate  model  for  this  data. 

Table  5.2  provides  a  summary  of  seventeen  of  the 

conditions.  It  indicates  the  best-fit  model  based  on  MSE, 

and  the  resulting  numerical  model.  The  remaining  eleven 

conditions  were  fitted  only  with  the  linear  model.  They  all 
2 

had  R  values  in  excess  of  .7000. 

Although  all  the  sample  sizes  are  small,  the  number 
of  data  points  in  IMI  Conditions  2  and  5  are  especially  small 
(four  and  five  respectively) .  For  this  reason,  the  variance 
of  the  coefficients  is  great  and  the  resulting  confidence 
intervals  are  so  large  that  they  do  not  reflect  the  actual 
distribution  of  the  coefficients.  It  is  suggested  that 
statistical  analyses  like  the  ones  used  in  this  paper  not  be 
applied  to  conditions  with  such  a  small  sample  size. 

Residual  Analysis  Results  and  Outliers 

From  Table  5.1,  nine  data  points  were  identified  as 
potential  outliers  because  they  lay  more  than  two  standard 
deviations  from  the  fitted  regression  line  value.  The  extreme 
values  came  from  seven  different  conditions:  DFVLR  FINE 


TABLE  5.2 


_ Summary  of  Regression  Analysis _ _ 

CONDITION  BEST-FIT  MODEL  RESULTING  MODEL 


log  yi=3.731+.,732  ( .  628)xi 


DFVLR  FINE 

NONLINEAR 

DFVLR  COARSE 

LINEAR 

DFVLR  MIXED 

NONLINEAR 

REP  HYDROVAC 

NEITHER  MODEL 

REP  LOW  W 

NONLINEAR 

REP  HIGH  W 

LINEAR 

SEP  LOW  W 

NONLINEAR 

SEP  HIGH  W 

LINEAR 

AS  CAST 

NONLINEAR 

CAST  &  HIP 

LINEAR 

CORONA  5 

LINEAR 

R14 

LINEAR 

R15 

NEITHER  MODEL 

R16 

NONLINEAR 

HR16 

NONLINEAR 

HR17 

NEITHER  MODEL 

REP  HIGH  W  HT  #1 

NONLINEAR 

log  y^=7 . 965- . 321x1 0  4x^ 
log  yi=3.738+.697(.612)Xi 
ADEQUATE 

log  yi=3. 604+1. 028 ( .756)Xl 
log  y^=6 . 641- . 199x10  4x^ 
log  yi=4.032+.783 ( .617)xi 
log  y ^=7 . 048- . 234x10  4x^ 
log  y±=2. 063+2. 250 ( .897)Xi 
log  yi=6.960-. 248x10  4xi 
log  y^=8 . 037- . 302x10  4x^ 
log  yi=9.415-.481xl0_4xi 
ADEQUATE 

log  yi=4.191+.322(.512)Xi 
log  yi=4.437+.004(.054)Xi 
ADEQUATE 

log  yi=4.025+.343(.492)xi 


(LINEAR) ,  DFVLR  COARSE  (LINEAR  and  NONLINEAR) ,  DFVLR  MIXED 
(LINEAR) ,  REP  LOW  W  (NONLINEAR) ,  SEP  LOW  W  (NONLINEAR) ,  AS 
CAST  (LINEAR  and  NONLINEAR) ,  and  R16  (LINEAR) . 
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The  data  points  were  removed  from  their  samples  and 
new  regression  lines  were  determined.  The  omitted  points  were 
then  treated  as  new  observations  and  ninety-five  percent  and 
ninety-nine  percent  prediction  intervals,  roughly  corresponding 
to  two  and  three  standard  deviation,  were  calculated.  If  the 
new  observation  did  not  lie  within  the  ninety-nine  percent 
prediction  interval,  it  was  identified  as  an  outlier  and 
deleted  from  further  statistical  analysis,  but  submitted  for 
f ractographic  analysis. 

An  example  of  a  plot  of  the  fitted  line  with  associated 
ninety-nine  percent  prediction  bands  is  shown  in  Figure  5.1. 

The  example  uses  DFVLR  FINE  data  and  presents  the  nonlinear 
fitted  model.  The  remaining  conditions'  fitted  lines  and 
ninety-nine  percent  prediction  bands  are  presented  in 
Appendix  B. 

The  results  of  this  second  regression  analysis  and  the 
ninety-five  percent  and  ninety-nine  percent  prediction 
intervals  determined  at  the  x^  corresponding  to  the  outlier 
value  are  presented  in  Table  5.3.  The  prediction  intervals 
are  presented  in  the  form  (lower  limit,  upper  limit  of  y) . 

From  Table  5.3  it  can  be  observed  that  the  REP  LOW  W 
and  SEP  LOW  W  extreme  values  lie  within  the  ninety-nine  per¬ 
cent  prediction  intervals  and  are  therefore  not  classified  as 
outliers.  The  original  regression  models  presented  in  Table 
5.2  are  retained. 
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Figure  5.1.  Plot  of  Fitted  Regression  Line  and  Associated 
99%  Prediction  Bands 


The  extreme  values  for  DFVLR  COARSE  (LINEAR  and  NON¬ 
LINEAR)  ,  AS  CAST  (LINEAR  and  NONLINEAR) ,  DFVLR  FINE  (LINEAR) , 
DFVLR  MIXED  (LINEAR)  and  R16  (LINEAR)  did  not  lie  within 
their  prediction  intervals  and  were  discarded.  The  exclusion 
of  the  outliers  produces  regression  lines  with  less  variabi¬ 
lity  and  a  better  fit  than  the  lines  determined  including  the 
outlier . 

Excluding  the  outliers  causes  DFVLR  FINE,  DFVLR  MIXED, 
AND  R16  to  have  different  best-fit  models  than  in  Table  5.2. 


TABLE  5.3 


Regression  Analysis  Results  with 
Outlier  Removed  and  Prediction  Interval 


DFVLR  COARSE 

AS  CAST 

a 

8.479 

6.279 

-4 

-4 

b 

-.364x10 

-.229x10 

SSE 

.4388 

.3311 

MSE 

.0548 

.0255 

2  SD  Outliers 

(75,  1931600) 

(40,  616100) 

R2 

.9243 

.9423 

LINEAR 

F 

97.643 

212.374 

Significance 

.000 

.000 

95%  P.I. 

(94060,  1447426) 

(4274,  22387) 

at  x.  = 

i 

80,000 

100,000 

99%  P.I. 

(50510,  2695401) 

(3100,  30864) 

at  x^ 

80,000 

100,000 

bl 

1.015 

1.777 

3.751 

2.600 

.906 

.915 

SSE 

.4107 

.2657 

MSE 

.0587 

.0221 

NONLINEAR 

2  SD  Outliers 

NONE 

NONE 

95%  P.I. 

(95518,  1515083) 

(4143,  19387) 

at  x^ 

80,000 

100,000 

99%  P.I. 

(52232,  2770656) 

(3072,  26151) 

at  x . 
i 

80,000 

100,000 
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TABLE  5.3  (continued) 


DFVLR  FINE 

DFVLR  MIXED 

a 

6.980 

6.881 

-4 

-4 

b 

256x10  * 

-.248x10 

SSE 

.424  7 

.3633 

MSE 

.0472 

.0454 

2  SD  Outliers 

NONE 

NONE 

R2 

.8846 

.8848 

LINEAR 

F 

68.998 

61.463 

Significance 

.000 

.000 

95%  P.I. 

(58252,  734286) 

(49789,  693217) 

at  x .  = 
x 

65,000 

65,000 

99%  P.I. 

(34112,  1253935) 

(27352,  1261865) 

at  x.  = 

65,000 

65,000 

■ 

REP  LOW  W 

SEP  LOW  W 

bl 

3.345 

4.132 

»2 

1.354 

.448 

>3 

.780 

.466 

SSE 

1.3440 

1.4381 

MSE 

.1222 

.1198 

NONLINEAR 

2  SD  Outliers 

NONE 

NONE 

95%  P.I. 

(56171,  2454585) 

(6476,  222513) 

at  x^  = 

90,000 

100,000 

99%  P.I. 

(26667,  5170276) 

(3261,  441828) 

at  Xi  = 

90,000 

100,000 
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TABLE  5.3  (continued) 


R16 

a 

6.505 

-4 

b 

-.183x10 

SSE 

.0499 

MSE 

.0083 

2  SD  Outliers 

NONE 

R2 

.9456 

LINEAR 

F 

104.378 

Significance 

.000 

95%  P.I. 

(72323,  254520) 

at  x^  = 

75,000 

99%  P.I. 

(52311,  351891) 

at  x^  = 

75,000 

DFVLR  COARSE  and  AS  CAST  retained  the  same  best-fit  models 
with  parameters  slightly  altered.  Table  5.4  presents  the 
new  best-fit  regression  lines. 

TABLE  5.4 


Summary  of  Regression  Lines  With  Outliers  Deleted 


CONDITION 

BEST-FIT  MODEL 

RESULTING  MODEL 

DFVLR  FINE 

LINEAR 

log  y^=6.980-. 256x10  ^x^ 

DFVLR  MIXED 

LINEAR 

log  y^=6 . 881- . 248x10  ^x^ 

DFVLR  COARSE 

LINEAR 

log  y^=8. 479-. 364x10  ^x^ 

AS  CAST 

NONLINEAR 

log  yi=l . 777+2. 600(.915)xi 

R16 

LINEAR 

log  y^=6 . 505- . 183x10  ^x^ 
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Testing  the  Random  Error  Assumptions 

The  assumptions  of  constant  variance  and  normality 
were  examined  using  the  residual  plot  produced  by  SPSS, 
Normality  was  assumed  if  no  more  than  five  percent  of  the 
residuals  lay  beyond  the  two  standard  deviation  limits 
about  the  mean.  Table  5.5  presents  the  findings  of  the 
examination  of  normality  for  the  twenty-eight  conditions 
of  Ti-6A1-4V  and  Corona  5.  Note  that  the  normality 
assumption  does  not  appear  to  be  violated  in  any  condition, 
perhaps  with  the  exception  of  DFVLR  COARSE.  However,  due 
to  the  small  sample  size,  the  single  extreme  value  repre¬ 
sents  a  larger  proportion  than  with  a  larger  sample.  Thus 
there  is  no  indication  from  the  data  that  the  normality 
assumption  is  violated  for  any  of  the  conditions. 

The  constant  variance  assumption  was  examined 
using  the  plot  of  residuals  versus  the  independent  variable. 
An  example  of  this  plot  for  REP  LOW  W  is  shown  in  Figure  5.2. 
Examination  of  the  similar  plots  constructed  for  the  other 
conditions  produced  no  observation  of  any  trends  in 
the  residuals. 
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E  5.5 


at  ion  For  Normal  it' 


NUMBER  OF  %  OF 

POINTS  BEYOND  POINTS  BEYOND 
95%  LIMITS  95%  LIMITS 


0 

0 

6.25 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


E  regression  lines  were  made: 


ations  of  nonlinear  models,  (2) 


ions  of  linear  models — SEP  LOW  W 


th  SEP  HIGH  W,  and  the  six 


70000  80000 


Figure  5.2.  Plot  of  Resi 


heat-treated  conditions  v 


ditions,  (3)  five  compari 


each  comparison  a  common 


comparison  of  linear  mode 


Assumptions  of  common  slo 


latter  two  sets  of  compar 


the  data  and  of  the  regre 


Nonlinear  comparisons, 


pairs  of  conditions  usi 
in  Table  5.6.  For  each 


pooled  and  the  following  nonlinear  model  was  used  for 
comparison: 

(34)  log  y.  =  ^  +  b4x2i  +  (b2  +  b^.)  (b3  +  bgx2i)xi 

This  form  is  identical  to  the  original  nonlinear 
model  except  that  a  dummy  variable,  x2 ,  and  three  parameters 
are  added.  x2  can  equal  zero  or  one  to  indicate  a  certain 
condition.  If  S4=35=6g=0,  then  we  have  no  evidence  to  suggest 
that  the  observations  were  from  different  populations.  Hence 
we  are  testing: 


H_:  S,,  #  0  and/or  ^  0  and/or  3,  /  0 
a  4  5  6 

In  all  comparisons,  the  ninety-five  percent  con¬ 
fidence  intervals  for  b4,  bg,  and  bg  contained  zero,  leading 
to  the  decision  not  to  reject  the  null  hypothesis.  That  is, 
we  cannot  reject  the  null  hypothesis  at  the  five  percent 
level  that  both  conditions  within  each  comparison  were  drawn 
from  the  same  population. 

Linear  comparisons.  The  results  of  the  comparison  of  pairs 
of  conditions  using  the  linear  model  are  shown  in  Table  5.7. 
The  following  linear  model  was  used  for  comparison: 


IflWl 


CONDITIONS 

n 

SEP  HIGH  W 

and 

SEP  LOW  W(x2=l) 

25 

(-478.882, 

488.722) 

(-488.754, 

478.906) 

(-1.028, 
4.423x10  ) 

REP  HIGH  W 

and 

24 

(-9.421, 

(-9.551, 

(-.953, 

REP  LOW  W(x2=l) 

9.688) 

10.242) 

.777) 

AS  CAST 

and 

29 

(-8.283, 

(-10.615, 

(-.522, 

CAST  &  HIP(x2=l) 

10.812) 

9.008) 

.393) 

DFVLR  FINE 

and 

(-13.469, 

(-7.668, 

(-.037, 

DFVLR  COARSE (x2=l) 

22 

8.036) 

14.250) 

.594) 

DFVLR  MIXED 
and 

21 

(-12.810, 

(-6.955, 

(-2.973X10-4, 

DFVLR  COARSE (x2=l) 

7.363) 

13.604) 

.589) 

DFVLR  FINE 

and 

23 

(-.631, 

(-.841, 

(-.269, 

DFVLR  MIXED (x2=l) 

.644) 

.771) 

.237) 

* 
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Once  again  is  a  i 
value  of  zero  or  one,  depen< 
five  percent  confidence  int< 


In  all  comparisons  1 
the  ninety-five  percent  coni 
the  null  hypothesis  cannot  fc 
CAST  &  HIP  and  CAST  &  HIP  HI 
null  hypothesis  at  the  five 
ninety-five  percent  confider 
tain  zero;  thus,  the  interce 
significantly  different  and 
can  be  expected  for  the  untr 
treated  condition  at  every  s 
The  definition  of  im 
When  the  slopes  are  equal,  t 
and  the  higher  line  is  great 
constant  amount.  Since  S-N 
log  N  the  constant  amount  is 
Moreover,  due  to  a  property 
equate  the  sum  of  logarithms 
[log  x  +  log  y  #  log(x  +  y) ] 
cycles  to  failure  is  not  a  c 
when  the  regression  line  is 


HT  #1 


Linear  comparisons  assuminc 


the  comparison  of  pairs  of 
linear  model  with  the  slop* 
in  Table  5.8.  The  form  of 
parison  of  the  heat-treate< 


(36)  log  yi  =  ; 


This  is  the  same  m< 


product  interaction  term  i: 
assumption.  The  ninety-fi' 


about  b2  tests  the  hypothe: 


&  HIP  with  CAST  &  HIP  HT  #1 


ST  &  HIP  HT  #1  fails  at  6252 
47  cycles,  an  improvement 
=  110,000  psi,  CAST  &  HIP  HT  #1 
SIP  at  671,143  cycles,  an 
3  cycles.  However,  the 
ain  constant  and  is  our 


which  did  not  contain  zero 
rejected  at  the  five  perce 
three  pairs .  SEP  LOW  W  HT 
ment  over  SEP  HIGH  W  HT  #1 
log  y  =  .538  improvement  o 
AS  CAST  HT  #1  showed  a  log 


HIP  HT  #1. 


The  form  of  the  li 
R14,  R15 ,  and  R16  is  as  fo 
(37)  log  yL  «  a  +  b2x2i 


TABLE  5.8 


Linear  Comparison  of  Heat-Treated  Conditions 


CONDITIONS 

n 

b2 

H  : 3^=0 
o  z 

SEP  LOW  W  HT  #1(X-=1) 
and 

SEP  HIGH  W  HT  #1 

16 

.704 

Reject 

REP  LOW  W  HT  #1 (x~=l ) 
and 

REP  HIGH  W  HT  #1 

15 

.538 

Reject 

AS  CAST  HT  #1 
and 

CAST  &  HIP  HT  #1 

18 

.615 

Reject 

The  dummy  variable  x3  takes  on  a  value  of  one 
when  R16  data  is  entered,  otherwise  it  is  zero.  Multiple 
hypotheses  of  no  differences  in  the  intercept  for  the 
three  conditions  are  tested  for  32  and  8^.  Table  5.9 
presents  the  results  of  this  comparison. 


TABLE  5.9 

Linear  Comparison  of  R14,  R15,  and  R16 


CONDITIONS 

n 

b2 

H  : 8o=0 
o  2 

b3 

H  :  3  =C 
o  3 

R14 

and 

NOT 

NOT 

R15(x2=l) 

and 

R16 (x3=l) 

22 

.383 

REJECT 

-.259 

REJECT 

nt  confidence  intervals  about 
e  null  hypotheses  cannot  be 
significance  level.  A  signif- 
ved  between  R14,  R15,  and  R16. 
r  model  used  for  comparison  of 
elows 

b3x3i  +  b4x4i  +  b5x5i  *  blxli 

•NDITION  2  data  is  entered; 
tosen;  is  one  when  CONDITION  4 
!  when  CONDITION  5  is  selected. 

:  zero  change  in  intercept  are 
the  results  of  the  IMI  compari- 


<E  5.10 

l  of  IMI  CONDITIONS  1-5 


The  null  hypotheses  ar 
4,  and  5  and  not  rejected  for 
shows  an  improvement  of  log  y 
CONDITION  2  shows  a  log  y  =  .2 
and  CONDITION  4  demonstrates  a 
CONDITION  1 .  CONDITION  3  was 
than  CONDITION  1. 

Linear  comparison  assuming  con 
the  comparison  of  HR15,  HR16, 
model  with  the  intercepts  assi 
in  Table  5.11.  The  form  of  t) 
parison  is  shown  below: 

(39)  log  yt  -  a  +  b]xli  +  1 

x2  equals  one  when  HR: 
equals  one  when  HR17  data  is  < 
confidence  intervals  about  b2 
zero  change  in  slope  between 
The  ninety-five  perce 
zero  for  b2  end  b^,  so  the  nu 
at  the  five  percent  signifies 
significant  differences  in  si 


HR17 . 


TABLE  5.11 


Linear  Comparison  of  HR15,  HR16,  and  HR17 


CONDITIONS 

n 

b2 

H  :S9=0 

O  2 

b3 

H  ;3,*0 
o  3 

HR15 

and 

HR16 (x2=l) 

and 

HR17 (x3=l) 

20 

-.323xl0~6 

Not 

Reject 

. 165x10"° 

Not 

Reject 

Summary 

This  chapter  presented  regression  and  residual 
analysis  results  for  twenty-eight  conditions  of  Ti-6A1-4V  and 
Corona  5.  Choice  of  the  better  fitting  model  was  made  on  the 
basis  of  smaller  MSE.  In  several  conditions,  neither  the 
linear  nor  the  nonlinear  model  adequately  fit  the  data.  Nine 
extreme  values  were  identified,  of  which  seven  were  classified 
as  greater-than-three  standard  deviation  outliers.  Residual 
plots  were  examined  to  test  the  assumptions  of  constant 
variance  and  normality.  Comparison  of  regression  lines  was 
performed  between  combinations  of  nonlinear  models  and 
between  combinations  of  linear  models. 
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ious  data  sets  compare? 

was  answered  through  regression 
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near  relationship  between  stress 
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tely  fit  by  either  model . 
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3  as  new  observations,  99  percent 


prediction  intervals  (roue 
were  calculated.  In  sever 
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the  sample,  identified  as 
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the  points  were  restored  t 
statistical  analysis. 

The  third  questioi 
sion  results .  Two  or  mor« 
analyzed,  using  dummy  var: 
For  most  of  the  comparisoi 
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Although  this  exai 
sample  size,  the  analysis 
reproducible  models  for  r< 
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There  is  reason  t 
cal  methods  could  be  exte 
ses  where  a  relationship 
These  methods  would  provi 
for  describing  these  rela 


98 


search 

111  be  presented  in  two  sets: 
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appendix  a 

FATIGUE  DATA 
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DFVLR  COARSE 


REP-Lo  W 


Stress  Level 
(psi) 

X 

Number  of 
Cycles  to 
Failure 

Y 

130,000 

5,300 

120,000 

13,000 

110,000 

33,900 

100,000 

66,200 

90,000 

94,200 

80,000 

209,200 

80,000 

37,400 

75,000 

379,300 

75,000 

1,931,600 

130,000 (Retest) 

6,500 

100,000 

71,400 

CORONA  5 

130,000 

7,200 

120,000 

28,900 

110,000 

17,400 

110,000 

50,300 

100,000 

471,300 

100,000 

528,000 

95,000 

25,600 

90,000 

57,300 

87,500 

87,800 

85,000 

1,499,700 

80,000 

2,434,800 

130,000 

23,200 

85,000 

1,393,000 

80,000 

87,900 

80,000 

112,300 

Stress  Level 
(psi) 
x 


130,000 

120,000 

120,000 

115,000 

110,000 

110,000 

100,000 

100,000 

90,000 

90,000 

80,000 

130,000 

90,000 

140,000 

130,000 


REP-Hi  W 


Number  c 
Cycles  t 
Failure 
Y 

14,90 

15,00 

25.40 
32,10 
14,30 
30,50 

281,50 

283.90 

33.40 

704.90 
1,262,20 

9,60 

185,29 

7,80 

69,60 


19,20 

41.80 

19.80 
60,30 

78.70 
17,60 

639,60 

102,90 

255,20 

11.70 


120,000 

110,000 

100,000 

95,000 

90,000 

80,000 

80,000 

75,000 

75,000 

130,000 


(retest) 


R14 


CAST  (AS  CAST) 


:ress  Level 
(psi) 

X 

Number  of 
Cycles  to 
Failure 

y 

20,000 

4,100 

L0 , 000 

9,000 

00,000 

47,700 

0,000 

285,000 

0,000 

323,100 

0,000 

644,700 

5,000 

2,276,000 

R16 

130,000 

14,540 

120,000 

23,450 

L10 , 000 

22,200 

LOO, 000 

51,200 

)0,000 

58,500 

)0,000 

94,400 

20,000 

110,200 

L30,000  (ret< 

ist)  12,700 

75,000 

1,306,700 

Stress  Level 

Number  of 

(psi) 

Cycles  to 
Failure 

X 

y 

7,700 

9,000 

1,600 

16,000 

20,000 

24.200 

29.200 

32.800 
42,100 
52,300 
74,000 

156,700 

116,500 

616,100 

2,600 

2.800 


W  HT  #1 

30.650 
43,660 

12.650 
127,260 
304,580 

2,177,000 

2,185,730 
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REP 

HIGH 

W  HT  #1 

SEP  HIGH  HT  #1 

Stress  Level 
(psi) 

X 

Number  of 
Cycles  to 
Failure 

y 

Stress  Level 
(psi) 

X 

Number  of 
Cycles  to 
Failure 

y 

130,000 

9,000 

130,000 

9,260 

120,000 

16,200 

130,000 

11,200 

110,000 

18,000 

130,000 

68,010 

100,000 

32,100 

110,000 

16,520 

90,000 

13,000 

90,000 

62,300 

85,000 

295,530 

85,000 

183,600 

80,000 

292,380 

80,000 

31,200 

75,000 

967,190 

70,000 

105,100 

70,000 

111,380 

AS 

CAST 

HT  #1 

CAST  &  HIP 

ht  n 

130,000 

9,400 

130,000 

1,960 

120,000 

6,840 

120,000 

2,770 

110,000 

3,260 

110,000 

4,500 

100,000 

13,840 

100,000 

6,710 

90,000 

26,450 

90,000 

6,460 

80,000 

31,700 

80,000 

5,720 

70,000 

137,200 

80,000 

8,200 

60,000 

116,620 

70,000 

13,400 

50,000 

159,800 

50,000 

13,180 
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XMI  CONDITION  2 

Stress  Level 

Number  of 

(psi) 

Cycles  to 
Failure 

X 

y 

1 - 1 

130,620 

I 

11,436 

130,620 

15,630 

130,620 

7,067 

123,370 

1 

35,296 

IMI  CONDITION 

4 

-  - _ - _ -  - _ 

130,000 

120,000 

110,000 

100,000 

90,000 

80,000 

80,000 


10,000 

29,300 

31,000 

64,200 

71,900 

94,100 

589,800 


APPENDIX  B 

PLOTS  OF  FATIGUE  DATA  AND 
FITTED  S-N  CURVES 
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